I am trying to construct a summed area table or integral image given an image matrix. For those of you who dont know what it is, from wikipedia:
A summed area table (also known as an integral image) is a data structure and algorithm for quickly and efficiently generating the sum of values in a rectangular subset of a grid
In other words, its used to sum up values of any rectangular region in the image/matrix in constant time.
I am trying to implement this in R. However, my code seems to take too long to run.
Here is the pseudo code from this link. in is the input matrix or image and intImg is whats returned
for i=0 to w do
sum←0
for j=0 to h do
sum ← sum + in[i, j]
if i = 0 then
intImg[i, j] ← sum
else
intImg[i, j] ← intImg[i − 1, j] + sum
end if
end for
end for
And here is my implementation
w = ncol(im)
h = nrow(im)
intImg = c(NA)
length(intImg) = w*h
for(i in 1:w){ #x
sum = 0;
for(j in 1:h){ #y
ind = ((j-1)*w)+ (i-1) + 1 #index
sum = sum + im[ind]
if(i == 1){
intImg[ind] = sum
}else{
intImg[ind] = intImg[ind-1]+sum
}
}
}
intImg = matrix(intImg, h, w, byrow=T)
Example of input and output matrix:
However, on a 480x640 matrix, this takes ~ 4 seconds. In the paper they describe it to take on the order of milliseconds for those dimensions.
Am I doing something inefficient in my loops or indexing?
I considered writing it in C++ and wrapping it in R, but I am not very familiar with C++.
Thank you
You could try to use apply (isn't faster than your for-loops if you pre-allocating the memory):
areaTable <- function(x) {
return(apply(apply(x, 1, cumsum), 1, cumsum))
}
areaTable(m)
# [,1] [,2] [,3] [,4]
# [1,] 4 5 7 9
# [2,] 4 9 12 17
# [3,] 7 13 16 25
# [4,] 9 16 22 33
Related
I keep ending up with a matrix populated entirely by 20s. It is iterating over the number and through the indices of the matrix M but it is over writing it each time when I am looking for a matrix that is 10x2 with only unique values.
n = 20;
M = matrix(NA, ncol = 2, nrow = 10);
a = 1
b = 1
for (i in 1:n){
for (r in 1:nrow(M))
for (c in 1:ncol(M))
i -> M[r,c]
print(M)
}
M
I would suggest that if the outer for-loop is indexed by increasing values to be entered as elements into the matrix, that the value of i should be used to decide which position it goes to. (If the values were not sequential then you could use the result of seq_along( your_non_consecutive_variable) as the index for the loop and the way to pick the value to be entered into the matrix. You CANNOT work with a single value set at the outer loop, and then repeat an assignment of that value multiple times with two nested inner loops.
n = 20;
M = matrix(NA, ncol = 2, nrow = 10);
a = 1
b = 1
for (i in 1:n){
if( i <= 10){ M[i, 1] <- i} else
{ M[i-10, 2] <- i}}
M
#---------
[,1] [,2]
[1,] 1 11
[2,] 2 12
[3,] 3 13
[4,] 4 14
[5,] 5 15
[6,] 6 16
[7,] 7 17
[8,] 8 18
[9,] 9 19
[10,] 10 20
That said this is only to be used as an exercise in understanding for-loops. A more R-ish way of putting values into a matrix would be:
var <- sample(1:20)
M <- matrix( var, 2, 10)
The values in var get assigned to rows 1:10 in the first column and then rows 1:10 in the second column. R handles its matrix indexing in a column major fashion. This is important to understand when working with the results of sapply operations.
In Excel, it's easy to perform a calculation on a previous cell by referencing that earlier cell. For example, starting from an initial value of 100 (step = 0), each next step would be 0.9 * previous + 9 simply by dragging the formula bar down from the first cell (step = 1). The next 10 steps would look like:
step value
[1,] 0 100.00000
[2,] 1 99.00000
[3,] 2 98.10000
[4,] 3 97.29000
[5,] 4 96.56100
[6,] 5 95.90490
[7,] 6 95.31441
[8,] 7 94.78297
[9,] 8 94.30467
[10,] 9 93.87420
[11,] 10 93.48678
I've looked around the web and StackOverflow, and the best I could come up with is a for loop (below). Are there more efficient ways to do this? Is it possible to avoid a for loop? It seems like most functions in R (such as cumsum, diff, apply, etc) work on existing vectors instead of calculating new values on the fly from previous ones.
#for loop. This works
value <- 100 #Initial value
for(i in 2:11) {
current <- 0.9 * value[i-1] + 9
value <- append(value, current)
}
cbind(step = 0:10, value) #Prints the example output shown above
It seems like you're looking for a way to do recursive calculations in R. Base R has two ways of doing this which differ by the form of the function used to do the recursion. Both methods could be used for your example.
Reduce can be used with recursion equations of the form v[i+1] = function(v[i], x[i]) where v is the calculated vector and x an input vector; i.e. where the i+1 output depends only the i-th values of the calculated and input vectors and the calculation performed by function(v, x) may be nonlinear. For you case, this would be
value <- 100
nout <- 10
# v[i+1] = function(v[i], x[i])
v <- Reduce(function(v, x) .9*v + 9, x=numeric(nout), init=value, accumulate=TRUE)
cbind(step = 0:nout, v)
filter is used with recursion equations of the form y[i+1] = x[i] + filter[1]*y[i-1] + ... + filter[p]*y[i-p] where y is the calculated vector and x an input vector; i.e. where the output can depend linearly upon lagged values of the calculated vector as well as the i-th value of the input vector. For your case, this would be:
value <- 100
nout <- 10
# y[i+1] = x[i] + filter[1]*y[i-1] + ... + filter[p]*y[i-p]
y <- c(value, stats::filter(x=rep(9, nout), filter=.9, method="recursive", sides=1, init=value))
cbind(step = 0:nout, y)
For both functions, the length of the output is given by the length of the input vector x.
Both of these approaches give your result.
Use our knowledge about the geometric series.
i <- 0:10
0.9 ^ i * 100 + 9 * (0.9 ^ i - 1) / (0.9 - 1)
#[1] 100.00000 99.00000 98.10000 97.29000 96.56100 95.90490 95.31441 94.78297 94.30467 93.87420 93.48678
You could also use purrr::accumulate:
data.frame(value = purrr::accumulate(0:10, ~ .x * .9 + 9, .init = 100))
value
1 100.00000
2 99.00000
3 98.10000
4 97.29000
5 96.56100
6 95.90490
7 95.31441
8 94.78297
9 94.30467
10 93.87420
11 93.48678
12 93.13811
.init is the initial value and there is also the argument .dir if you want to control the direction ("forward" is the default)
I am trying to generate n random numbers whose sum is less than 1.
So I can't just run runif(3). But I can condition each iteration on the sum of all values generated up to that point.
The idea is to start an empty vector, v, and set up a loop such that for each iteration, i, a runif() is generated, but before it is accepted as an element of v, i.e. v[i] <- runif(), the test sum(v) < 1 is carried out, and while FALSE the last entry v[i] is finally accepted, BUT if TRUE, that is the sum is greater than 1, v[i] is tossed out of the vector, and the iteration i is repeated.
I am far from implementing this idea, but I would like to resolve it along the lines of something similar to what follows. It's not so much a practical problem, but more of an exercise to understand the syntax of loops in general:
n <- 4
v <- 0
for (i in 1:n){
rdom <- runif(1)
if((sum(v) + rdom) < 1) v[i] <- rdom
}
# keep trying before moving on to iteration i + 1???? i <- stays i?????
}
I have looked into while (actually I incorporated the while function in the title); however, I need the vector to have n elements, so I get stuck if I try something that basically tells R to add random uniform realizations as elements of the vector v while sum(v) < 1, because I can end up with less than n elements in v.
Here's a possible solution. It doesn't use while but the more generic repeat. I edited it to use a while and save a couple of lines.
set.seed(0)
n <- 4
v <- numeric(n)
i <- 0
while (i < n) {
ith <- runif(1)
if (sum(c(v, ith)) < 1) {
i <- i+1
v[i] <- ith
}
}
v
# [1] 0.89669720 0.06178627 0.01339033 0.02333120
Using a repeat block, you must check for the condition anyways, but, removing the growing problem, it would look very similar:
set.seed(0)
n <- 4
v <- numeric(n)
i <- 0
repeat {
ith <- runif(1)
if (sum(c(v, ith)) < 1) {
i <- i+1
v[i] <- ith
}
if (i == 4) break
}
If you really want to keep exactly the same procedure that you have posted (aka iteratively sample the n values one at a time from the standard uniform distribution, rejecting any samples that cause your sum to exceed 1), then the following code is mathematically equivalent, shorter, and more efficient:
samp <- function(n) {
v <- rep(0, n)
for (i in 1:n) {
v[i] <- runif(1, 0, 1-sum(v))
}
v
}
Basically, this code uses the mathematical fact that if the sum of the vector is currently sum(v), then sampling from the standard uniform distribution until you get a value no greater than 1-sum(v) is exactly equivalent to sampling in the uniform distribution from 0 to 1-sum(v). The advantage of using the latter approach is that it's much more efficient -- we don't need to keep rejecting samples and trying again, and can instead just sample once for each element.
To get a sense of the runtime differences, consider sampling 100 observations with n=10, comparing to a working implementation of the code from your post (copied from my other answer to this question):
OP <- function(n) {
v <- rep(0, n)
for (i in 1:n){
rdom <- runif(1)
while (sum(v) + rdom > 1) rdom <- runif(1)
v[i] <- rdom
}
v
}
set.seed(144)
system.time(samples.OP <- replicate(100, OP(10)))
# user system elapsed
# 261.937 1.641 265.805
system.time(samples.josliber <- replicate(100, samp(10)))
# user system elapsed
# 0.004 0.001 0.004
In this case, the new approach is approaching 100,000 times faster.
It sounds like you're trying to uniformly sample from a space of n variables where the following constraints hold:
x_1 + x_2 + ... + x_n <= 1
x_1 >= 0
x_2 >= 0
...
x_n >= 0
The "hit and run" algorithm is the mathematical machinery that enables you to do exactly this. In 2-dimensional space, the algorithm will sample uniformly from the following triangle, with each location in the shaded area being equally likely to be selected:
The algorithm is provided in R through the hitandrun package, which requires you to specify the linear inequalities that define the space through a constraint matrix, direction vector, and right-hand side vector:
library(hitandrun)
n <- 3
constr <- list(constr = rbind(rep(1, n), -diag(n)),
dir = c(rep("<=", n+1)),
rhs = c(1, rep(0, n)))
set.seed(144)
samples <- hitandrun(constr, n.samples=1000)
head(samples, 10)
# [,1] [,2] [,3]
# [1,] 0.28914690 0.01620488 0.42663224
# [2,] 0.65489979 0.28455231 0.00199671
# [3,] 0.23215115 0.00661661 0.63597912
# [4,] 0.29644234 0.06398131 0.60707269
# [5,] 0.58335047 0.13891392 0.06151205
# [6,] 0.09442808 0.30287832 0.55118290
# [7,] 0.51462261 0.44094683 0.02641638
# [8,] 0.38847794 0.15501252 0.31572793
# [9,] 0.52155055 0.09921046 0.13304728
# [10,] 0.70503030 0.03770875 0.14299089
Breaking down this code a bit, we generated the following constraint matrix:
constr
# $constr
# [,1] [,2] [,3]
# [1,] 1 1 1
# [2,] -1 0 0
# [3,] 0 -1 0
# [4,] 0 0 -1
#
# $dir
# [1] "<=" "<=" "<=" "<="
#
# $rhs
# [1] 1 0 0 0
Reading across the first line of constr$constr we have 1, 1, 1 which indicates "1*x1 + 1*x2 + 1*x3". The first element of constr$dir is <=, and the first element of constr$rhs is 1; putting it together we have x1 + x2 + x3 <= 1. From the second row of constr$constr we read -1, 0, 0 which indicates "-1*x1 + 0*x2 + 0*x3". The second element of constr$dir is <= and the second element of constr$rhs is 0; putting it together we have -x1 <= 0 which is the same as saying x1 >= 0. The similar non-negativity constraints follow in the remaining rows.
Note that the hit and run algorithm has the nice property of having the exact same distribution for each of the variables:
hist(samples[,1])
hist(samples[,2])
hist(samples[,3])
Meanwhile, the distribution of the samples from your procedure will be highly uneven, and as n increases this problem will get worse and worse.
OP <- function(n) {
v <- rep(0, n)
for (i in 1:n){
rdom <- runif(1)
while (sum(v) + rdom > 1) rdom <- runif(1)
v[i] <- rdom
}
v
}
samples.OP <- t(replicate(1000, OP(3)))
hist(samples.OP[,1])
hist(samples.OP[,2])
hist(samples.OP[,3])
An added advantage is that the hit-and-run algorithm appears faster -- I generated these 1000 replicates in 0.006 seconds on my computer with hit-and-run and it took 0.3 seconds using the modified code from the OP.
Here's how I would do it, without any loop, if or while:
set.seed(123)
x <- runif(1) # start with the sum that you want to obtain
n <- 4 # number of generated random numbers, can be chosen arbitrarily
y <- sort(runif(n-1,0,x)) # choose n-1 random points to cut the range [0:x]
z <- c(y[1],diff(y),x-y[n-1]) # result: determine the length of the segments
#> z
#[1] 0.11761257 0.10908627 0.02723712 0.03364156
#> sum(z)
#[1] 0.2875775
#> all.equal(sum(z),x)
#[1] TRUE
The advantage here is that you can determine exactly which sum you want to obtain and how many numbers n you want to generate for this. If you set, e.g., x <- 1 in the second line, the n random numbers stored in the vector z will add up to one.
In a tutorial on for() Loops came across the following exercise:
Exercise 4.4. Write a function to perform matrix-vector multiplication. It should take a matrix A and a vector b as arguments, and return the vector Ab. Use two loops to do this, rather than %*% or any vectorization.
Lets say I use a specific matrix A(dim:3,4) and vector b(length(3)).
> # Ex 4.4
> out<-c(1,1,1)
> Ab<-function(A,b) {
+ for(i in 1:dim(A)[1]) {
+
+ out[i]=sum(A[i,]*b)
+ }
+ out
+ }
> a = c(1,1,1)
> A
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 10
> a
[1] 1 1 1
> Ab(A,a)
[1] 12 15 19
This works for a very specific case, i.e. matrix with 3 rows and vector of length 3, but leaves much to be desired, i don't know what a good solution to this exercise would be but the question says 'use two loops'. Suggestions will be much appreciated.
thx
You are hiding the inner loop with A[i,]*b which is doing vectorized multiplication (ie. a hidden loop). So, if you expand that out explicitly you will have the two required loops.
Ab<-function(A,b) {
if (dim(A)[2] != NROW(b)) stop("wrong dimensions")
out <- matrix(, nrow(A), 1)
for(i in 1:dim(A)[1]) {
s <- 0
for (j in 1:dim(A)[2]) s <- s + A[i,j] * b[j]
out[i] <- s
}
out
}
I am learning R and reading the book Guide to programming algorithms in r.
The book give an example function:
# MATRIX-VECTOR MULTIPLICATION
matvecmult = function(A,x){
m = nrow(A)
n = ncol(A)
y = matrix(0,nrow=m)
for (i in 1:m){
sumvalue = 0
for (j in 1:n){
sumvalue = sumvalue + A[i,j]*x[j]
}
y[i] = sumvalue
}
return(y)
}
How do I call this function in the R console? And what exactly is passing into this function A, X?
The function takes an argument A, which should be a matrix, and x, which should be a numeric vector of same length as values per row in A.
If
A <- matrix(c(1,2,3,4,5,6), nrow = 2, ncol = 3)
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6
then you have 3 values (number of columns, ncol) per row, thus x needs to be something like
x <- c(4,5,6)
The function itself iterates all rows, and in each row, each value is multiplied with a value from x, where the value in the first column is multiplied with the first value in x, the value in As second column is multiplied with the second value in x and so on. This is repeated for each row, and the sum for each row is returned by the function.
matvecmult(A, x)
[,1]
[1,] 49 # 1*4 + 3*5 + 5*6
[2,] 64 # 2*4 + 4*5 + 6*6
To run this function, you first have to compile (source) it and then consecutively run these three code lines:
A <- matrix(c(1,2,3,4,5,6), nrow = 2, ncol = 3)
x <- c(4,5,6)
matvecmult(A, x)
This function is designed to return the product of a matrix A with a vector x; i.e. the result will be the matrix product A x (where - as is usual in R, the vector is a column vector). An example should make things clear.
# define a matrix
mymatrix <- matrix(sample(12), nrow <- 4)
# see what the matrix looks like
mymatrix
# [,1] [,2] [,3]
# [1,] 2 10 9
# [2,] 3 1 12
# [3,] 11 7 5
# [4,] 8 4 6
# define a vector where multiplication of our matrix times the vector will be defined
vec3 <- c(-1,0,1)
# apply the function to our matrix and vector
result <- matvecmult(mymatrix, vec3)
result
# [,1]
# [1,] 7
# [2,] 9
# [3,] -6
# [4,] -2
class(result)
# [1] "matrix"
So matvecmult(mymatrix, vec3) is how you would call this function, and the result is an n by 1 matrix, where n is the number of rows in the matrix argument.
You can also get some insight by playing around and seeing what happens when you pass something other than a matrix-vector pair where the product is defined. In some cases, you will get an error; sometimes you get nonsense; and sometimes you get something you might not expect just from the function name. See what happens when you call matvecmult(mymatrix, mymatrix).
The function is calculating the product of a Matrix and a column vector. It assumes both the number of columns of the matrix is equal to the number of elements in the vector.
It stores the number of columns of A in n and number of rows in m.
It then initializes a matrix of mrows with all values as 0.
It iterates along the rows of A and multiplies each value in each row with the values in x.
The answer is the stored in y and finally it returns the single column matrix y.