get an element from a subset of a matrix in Rarmadillo - r

I have a large-ish matrix. I'm trying to sample from it with dynamically changing weights. As it's forced to use loops in R, I'm trying to implement it in Rcpp so it has the chance of running a bit faster. After a bit of experimenting, I think I've figured out how to grab an index at random with the correct weights.
The trick is that I'm only sampling from a subset of columns at any given time (this can change to rows if it's more efficient in C - the matrix is actually symmetric). My indices are only defined for this subset of columns. In R, I'd do something along the lines of
large_matrix[, columns_of_interest][index]
and this works fine. How would I do the equivalent using Rcpp/Armadillo? My guess of
cppFunction("arma::vec element_from_subset(arma::mat D, arma::uvec i, arma::uvec columns) {
# arma::mat D_subset = D.cols(columns);
return D.cols(columns).elem(i);
}", depends = "RcppArmadillo")
fails to compile (and .at instead of .elem doesn't work either, nor does the standard R trick of surrounding things in paranthesis.
This does work, but is what I'm trying to avoid:
cppFunction("arma::vec element_from_subset(arma::mat D, arma::uvec i, arma::uvec columns) {
arma::mat D_subset = D.cols(columns);
return D_subset.elem(i);
}", depends = "RcppArmadillo")
Is there any way to accommplish this without needing to save D.cols(columns)?

Short answer: No.
But, the problem is phrased incorrectly. Think about what is happening here:
(M <- matrix(1:9, 3, 3))
#> [,1] [,2] [,3]
#> [1,] 1 4 7
#> [2,] 2 5 8
#> [3,] 3 6 9
columns_of_interest = 1:2
M[, columns_of_interest]
#> [,1] [,2]
#> [1,] 1 4
#> [2,] 2 5
#> [3,] 3 6
From here, if we have the index being 1, then we get:
index = 1
M[, columns_of_interest][index]
#> 1
So, in essence, what's really happening is an entry-wise subset of (i,j). Thus, you should just use:
Rcpp::cppFunction("double element_from_subset(arma::mat D, int i, int j) {
return D(i, j);
}", depends = "RcppArmadillo")
element_from_subset(M, 0, 0)
#> [1] 1
I say this based on the R and C++ code posted, e.g. R gives 1 value and C++ has a return type permitting only one value.
The code posted by OP is shown without the error. The initial error as compiled will indicate there is an issue using an Rcpp object inside of an arma class. If we correct the types, e.g. replacing Rcpp::IntegerVector with an arma appropriate type of either arma::ivec or arma::uvec, then compiling yields a more informative error message.
Corrected Code:
Rcpp::cppFunction("double element_from_subset(arma::mat D, int i, arma::uvec columns) {
return D.cols(columns).elem(i);
}", depends = "RcppArmadillo")
Error Message:
file6cf4cef8267.cpp:10:26: error: no member named 'elem' in 'arma::subview_elem2<double, arma::Mat<unsigned int>, arma::Mat<unsigned int> >'
return D.cols(columns).elem(i);
~~~~~~~~~~~~~~~ ^
1 error generated.
make: *** [file6cf4cef8267.o] Error 1
So, there is no way to subset a subview that was created by taking the a subset from an armadillo object.
You may want to read up on a few of the subsetting features of Armadillo. They are immensely helpful.
Rcpp Gallery: http://gallery.rcpp.org/articles/armadillo-subsetting
Guide to Converting R Code to Armadillo: http://thecoatlessprofessor.com/programming/common-operations-with-rcpparmadillo/
Armadillo specific documentation
Matrix subsets: http://arma.sourceforge.net/docs.html#submat
Individual entries: http://arma.sourceforge.net/docs.html#element_access
sub2ind(): http://arma.sourceforge.net/docs.html#sub2ind
ind2sub(): http://arma.sourceforge.net/docs.html#ind2sub
Disclaimer: Both the first and second links I've contributed to or written.

Related

Two errors in writing R code of QR decomposition using Gram-Schmidtand method and want to know why it went wrong

I write code manually of QR decomposition using Gram-Schmidt orthogonalization:
A<-cbind(c(2,-2,18),c(2,1,0),c(1,2,0),c(2,3,4))
gsm<-function(X){
m<-ncol(X)
n<-nrow(X)
# initialize Q and R
q<-matrix(0,m,n)
r<-matrix(0,n,n)
v<-matrix(0,m,n)
# initialize V
v[,1]<-X[,1]
q[,1]<-v[,1]/sqrt(sum(v[,1]^2))
r[1,1]<-t(X[,1])%*%q[,1]
for (i in 2:n){
dv<-0
for (j in 1:(i-1)) {
r[j,i]<-t(X[,i])%*%q[,j]
dv<-dv+r[j,i]*q[,j]
}
v[,i]<-X[,i]-dv
q[,i]<-v[,i]/sqrt(t(v[,i])%*%v[,i])
r[i,i]<-t(X[,i])%*%q[,i]
}
qrreport<-list("Q"=q,"R"=r)
return(qrreport)
}
gsm(A)
However, the code doesn't work and gives me the error:
Error in v[, 1] <- X[, 1] : number of items to replace is not a multiple of replacement length
And when I replace A with a 3*3 matrix: A<-cbind(c(2,-2,18),c(2,1,0),c(1,2,0)) and operate the function again, R throws a new error to me as:
Recycling array of length 1 in vector-array arithmetic is deprecated.
Use c() or as.vector() instead.
Recycling array of length 1 in vector-array arithmetic is deprecated.
Use c() or as.vector() instead.
$ Q
[,1] [,2] [,3]
[1,] 0.1097643 0.89011215 -0.4423259
[2,] -0.1097643 0.45314800 0.8846517
[3,] 0.9878783 -0.04855157 0.1474420
$R
[,1] [,2] [,3]
[1,] 18.22087 0.1097643 -0.1097643
[2,] 0.00000 2.2333723 1.7964082
[3,] 0.00000 0.0000000 1.3269776
I am very confused where I make mistakes and hope someone could help me debug.
Your A matrix has 3 rows and 4 columns, so in gsm() m is 4 and n is 3. That means v has 4 rows and 3 columns, whereas X, which is really A, only has 3 rows. When v[, 1] <- X[, 1] tries to put the 1st column of X into the first
column of A, you get the error message you saw.
To debug things like this in RStudio, set a breakpoint on the line v[, 1] <- X[, 1] that caused the error, and look at the different items in the expression before executing it. If you're not using RStudio, you can still set a breakpoint there using the setBreakpoint function, but it's a lot more work.

Understanding code for custom in-place modification function?

I came across this post: http://r.789695.n4.nabble.com/speeding-up-perception-tp3640920p3646694.html from Matt Dowle, discussing some early? implementation ideas of the data.table package.
He uses the following code:
x = list(a = 1:10000, b = 1:10000)
class(x) = "newclass"
"[<-.newclass" = function(x,i,j,value) x # i.e. do nothing
tracemem(x)
x[1, 2] = 42L
Specifically I am looking at:
"[<-.newclass" = function(x,i,j,value) x
I am trying to understand what is done there and how i could use this notation.
It looks to me like:
i is the row index
j is column index
value is the value to be assigned
x is the object under consideration
My best guess would therefore be that i define a custom function for in place modification (for a given class).
[<-.newclass is in class modification for class newclass.
Understanding what happens:
Usually the following code should return an error:
x = list(a = 1:10000, b = 1:10000)
x[1, 2] = 42L
so i guess the sample code does not have any practical use.
Attempt to use the logic:
A simple non-sense try would be to square the value to be inserted:
x[i, j] <- value^2
Full try:
> x = matrix(1:9, 3, 3)
> class(x) = "newclass"
> "[<-.newclass" = function(x, i, j, value) x[i, j] <- value^2 # i.e. do something
> x[1, 2] = 9
Error: C stack usage 19923536 is too close to the limit
This doesnt seem to work.
My question(s):
"[<-.newclass" = function(x,i,j,value) x
How exactly does this notation work and how would I use it?
(I add data.table tag since the linked discussion is about the "by-reference" in place modification in data.table, i think).
The `[<-`() function is (traditionally) used for subassignment, and is, more broadly, a type of replacement function. It is also generic (more specifically, an internal generic), which allows you to write custom methods for it, as you correctly surmised.
Replacement functions
In general, when you call a replacement function, such as ...
foo(x) <- bar(y)
... the expression on the right hand side of <- (so here bar(y)) gets passed as a named value argument to `foo<-`() with x as the first argument, and the object x is reassigned with the result: that is, the said call is equivalent to writing:
x <- `foo<-`(x, value = bar(y))
So in order to work at all, all replacement functions must take at least two arguments, one of which must be named value.
Most replacement functions only have these two arguments, but there are also exceptions: such as `attr<-` and, typically, subassignment.
Subassignment
When you have a subassignment call like x[i, j] <- y, i and j get passed as additional arguments to the `[<-`() function with x and y as the first and value arguments, respectively:
x <- `[<-`(x, i, j, value = y) # x[i, j] <- y
In the case of a matrix or a data.frame, i and j would be used for selecting rows and columns; but in general, this does not need to be the case. A method for a custom class could do anything with the arguments. Consider this example:
x <- matrix(1:9, 3, 3)
class(x) <- "newclass"
`[<-.newclass` <- function(x, y, z, value) {
x + (y - z) * value # absolute nonsense
}
x[1, 2] <- 9
x
#> [,1] [,2] [,3]
#> [1,] -8 -5 -2
#> [2,] -7 -4 -1
#> [3,] -6 -3 0
#> attr(,"class")
#> [1] "newclass"
Is this useful or reasonable? Probably not. But is it valid R code? Absolutely!
It's less common to see custom subassignment methods in real applications, as `[<-`() usually "just works" as you might expect it to, based on the underlying object of your class. A notable exception is `[<-.data.frame`, where the underlying object is a list, but subassignment behaves matrix-like. (On the other hand, many classes do need a custom subsetting method, as the default `[`() method drops most attributes, including the class attribute, see ?`[` for details).
As to why your example doesn't work: remember that you are writing a method for a generic function, and all the regular rules apply. If we use the functional form of `[<-`() and expand the method dispatch in your example, we can see immediately why it fails:
`[<-.newclass` <- function(x, i, j, value) {
x <- `[<-.newclass`(x, i, j, value = value^2) # x[i, j] <- value^2
}
That is, the function was defined recursively, without a base case, resulting in an infinite loop. One way to get around this would be to unclass(x) before calling the next method:
`[<-.newclass` <- function(x, i, j, value) {
x <- unclass(x)
x[i, j] <- value^2
x # typically you would also add the class back here
}
(Or, using a somewhat more advanced technique, the body could also be replaced with an explicit next method like this: NextMethod(value = value^2). This plays nicer with inheritance and superclasses.)
And just to verify that it works:
x <- matrix(1:9, 3, 3)
class(x) <- "newclass"
x[1, 2] <- 9
x
#> [,1] [,2] [,3]
#> [1,] 1 81 7
#> [2,] 2 5 8
#> [3,] 3 6 9
Perfectly confusing!
As for the context of Dowle's "do nothing" subassignment example, I believe this was to illustrate that back in R 2.13.0, a custom subassignment method would always cause a deep copy of the object to be made, even if the method itself did nothing at all. (This is no longer the case, since R 3.1.0 I believe.)
Created on 2018-08-15 by the reprex package (v0.2.0).

Calculate a geometric progression

I'm using brute force right now..
x <- 1.03
Value <- c((1/x)^20,(1/x)^19,(1/x)^18,(1/x)^17,(1/x)^16,(1/x)^15,(1/x)^14,(1/x)^13,(1/x)^12,(1/x)^11,(1/x)^10,(1/x)^9,(1/x)^8,(1/x)^7,(1/x)^6,(1/x)^5,(1/x)^4,(1/x)^3,(1/x)^2,(1/x),1,x,x^2,x^3,x^4,x^5,x^6,x^7,x^8,x^9,x^10,x^11,x^12,x^13,x^14,x^15,x^16,x^17,x^18,x^19,x^20)
Value
but I would like to use an increment loop just like the for loop in java
for(integer I = 1; I<=20; I++)
^ is a vectorized function in R. That means you can simply use x^(-20:20).
Edit because this gets so many upvotes:
More precisely, both the base parameter and the exponent parameter are vectorized.
You can do this:
x <- 1:3
x^2
#[1] 1 4 9
and this:
2^x
#[1] 2 4 8
and even this:
x^x
#[1] 1 4 27
In the first two examples the length-one parameter gets recycled to match the length of the longer parameter. Thats why the following results in a warning:
y <- 1:2
x^y
#[1] 1 4 3
#Warning message:
# In x^y : longer object length is not a multiple of shorter object length
If you try something like that, you probably want what outer can give you:
outer(x, y, "^")
# [,1] [,2]
#[1,] 1 1
#[2,] 2 4
#[3,] 3 9
Roland already addressed the fact that you can do this vectorized, so I will focus on the loop part in cases where you are doing something more that is not vectorized.
A Java (and C, C++, etc.) style loop like you show is really just a while loop. Something that you would like to do as:
for(I=1, I<=20, I++) { ... }
is really just a different way to write:
I=1 # or better I <- 1
while( I <= 20 ) {
...
I <- I + 1
}
So you already have the tools to do that type of loop. However if you want to assign the results into a vector, matrix, array, list, etc. and each iteration is independent (does not rely on the previous computation) then it is usually easier, clearer, and overall better to use the lapply or sapply functions.

Double loop in R with a matrix

I know the error that I commit, but I cannot find any solution. I'm programming a double loop for simulating with the Monte Carlo method.
set.seed(-1256,normal.kind="Box-Muller")
A <- matrix(Nsimul,85)
for (k in 1:Nsimul) {
r=c()
r[1]=r0_CIR
S=c()
S[1]=I0
A[,1]=r0_CIR
for(j in 1:NumPassi){
epsilon=rnorm(2,0,1)
r[j+1]= r[j]+alphaStar*(gammaStar-r[j])*Deltat + rho*sqrt(r[j])*epsilon[1]*sqrt(Deltat)
if (r[j+1]<0) r[j+1]=abs(r[j+1])
epsilon_S=epsilon[1]+sqrt(1-corr^2)*epsilon[2]
S[j+1]=S[j]*exp((r[j]-sigma^2/2-div)*Deltat+sigma*epsilon_S*sqrt(Deltat))
A[k,j+1]=r[j+1]
}
}
when I try to run the code I have this error
Error in `[<-`(`*tmp*`, , j + 1, value = 0.0102279735166489) : subscript out of bounds
I don't understand which value is out of bounds.
While you may incrementally grow vectors by calling indeces that don't exist (not a great practice but I digress), for example
S<-c()
S[1]<-1
S
#[1] 1
You may not do so in the method you are trying to with the matrix A in your example.
Here is an example matrix I made
A<-matrix(1:10, nrow=5)
# [,1] [,2]
#[1,] 1 6
#[2,] 2 7
#[3,] 3 8
#[4,] 4 9
#[5,] 5 10
and if I try
A[1,3]<-1
I get
#Error in A[1, 3] <- 2 : subscript out of bounds
While you have many things undefined in your question, such as A, Numsimul, Numpassi, so I can't know for sure what's going on, but the final line of the second for loop is most likely the issue. If the value for j becomes larger than the number of columns in your matrix A, then you will have an error thrown on you.
You must ensure that the column exists first before you begin setting it in the second for loop

R eqivalent to Matlab cell2mat function?

I am not very experienced with R or Matlab, and I am trying to convert a Matlab code to R. The problem is, I am not exactly sure what the code does. Here is the matlab line I am having trouble with:
Wi = cell2mat(accumarray(s_group,s_u,[],#(x){repmat(sqrt(sum(x.*x)+eps),length(x),1)}));
I cannot find an R function that does the same sort of thing as the cell2mat in matlab.
When I run the code with my example data, the matlab gives me an array of length 86, which is the same length as the s_group and s_u variables.
However, when I use the same data with this R code:
Wi<-accumarray(s_group,s_u,sz=c(nrow(s_group),ncol(s_group)),func=function(x) matrix(sqrt(x*x),length(x),1))
it gives me the error
Error in accumarray(s_group, s_u, sz = c(length(s_group), 1), func = function(x) matrix(sqrt(x * :
Argument 'sz' does not fit with 'subs'.
I also tried it without the size specified:
Wi<-accumarray(s_group,s_u,func=function(x) matrix(sqrt(x*x),length(x),1))
and this gave me an array of length 21 with the error:
In A[i] <- func(val[subs == i]) :
number of items to replace is not a multiple of replacement length
Here is the original for-loop version from Matlab:
group_set = unique(group);
group_num = length(group_set);
Wi = zeros(n_XVar, 1);
for c = 1:group_num
idx = find(group==group_set(c));
Wc = u(idx,:);
di = sqrt(sum(sum(Wc.*Wc))+eps);
Wi(idx) = di;
end
Does anyone know what I can do to put this into R without using a for-loop?
Many thanks!
It seems the cell2mat function in Matlab turns a matrix of matrixes into a single matrix. A matrix of matrixes isn't exactly a common data type in R. But you can make one with
a<-matrix(list(), 2,2)
a[[1,1]]<-matrix(1, nrow=1)
a[[1,2]]<-matrix(2:4, nrow=1)
a[[2,1]]<-matrix(c(5,9), nrow=2)
a[[2,2]]<-matrix(c(6:8, 10:12), nrow=2, byrow=T)
(like the example on the MatLab help page for cel2mat). An R translation of that code might be
cell2mat<-function(m)
do.call(rbind, apply(m, 1, function(x) do.call(cbind,x)))
Which we can test wtih
cell2mat(a)
# [,1] [,2] [,3] [,4]
# [1,] 1 2 3 4
# [2,] 5 6 7 8
# [3,] 9 10 11 12
This is probably not the most efficient translation but it's probably the simplest.

Resources