does anyone know a more efficient way to do the following? I have two matrices, one with integer values, the other with numeric. I also have a list containing integer vectors. I have a loop which sets the values of the numeric matrix to NA, when the integer values from the list are equal to the integer matrix. Then I get the row product and sum from the resulting matrix. Is there a way to avoid creating a copy of the numeric matrix each step? Or perhaps another approach all together? Thanks
mat1<-matrix(rpois(20*300000,6),20,300000)
mat2<-matrix(runif(20*300000),20,300000)
list1<-list(c(1,2,3),c(4,6),c(8,9,10,11))
results<-vector('numeric',length(list1))
start.time=Sys.time()
for(i in 1:length(list1)){
copy<-mat2
copy[mat1 %in% list1[[i]]]=NA
results[i]=sum(apply(copy,2,prod,na.rm=T))
}
print(Sys.time()-start.time)
#
Replacing your apply with colSums (assuming your numbers are positive, you'll need a bit more fiddling otherwise) gives me a close to 2x speed improvement:
for(i in 1:length(list1)){
copy<-mat2
copy[mat1 %in% list1[[i]]]=NA
results[i]=sum(exp(colSums(log(copy), na.rm = T)))
}
Related
If I repeat this code
x<-1:6
n<-40
M<-200
y<-replicate(M,as.numeric(table(sample(x,n,1))))
str(y)
sometimes R decide to create a matrix and sometimes it creates a list. Can you explain me the reason for that? How can I be sure that it is a matrix or a list?
If you chose M very small, for example 10, it will almost always create a matrix. If you chose M very large, for example 2000, it will create a list.
You get a list for cases when not all the numbers in x are sampled.
You can always return a list by using simplify = FALSE.
y <- replicate(M, as.numeric(table(sample(x,n,TRUE))), simplify = FALSE)
Also, you are using 1 to set replace argument. It is better to use logical argument i.e TRUE.
To return always a matrix, we can do :
sapply(y, `[`, x)
This will append NA's for values where length is unequal.
May be it will help
[https://rafalab.github.io/dsbook/r-basics.html#data-types][1]
Vectors in matrix have to be all the same type and length
Vectors in list can contain elements of different classes and length
Try this:
x<-1
y<-2:7
z<-matrix(x,y)
z<-list(x,y)
In first case you will get matrix 2 rows and 1 column because y vector is longer
In the second case you will get a list with elements of different length.
Also
str()
function is very useful. But you can find the class of object using
class()
function.
I want to test whether every element of data frame is greater than 0. If it is greater than zero it would we will be "buy" otherwise "sell". I used sapply. It allocated every value "sell". I used following code. Also recommend for loop solution.
df1<-sapply(df,function(x) ifelse(x>0,yes="buy",no="sell"))
If it is a matrix (or even data.frame), create a logical matrix by using the comparison operator. This gives a TRUE/FALSE logical matrix which is of value 1/0. If we add 1 to it, it changes to 2/1 and based on that index, we can replace values (in R, indexing starts from 1)
df[] <- c("sell", "buy")[(df >0) + 1]
Also, in the comments, it was recommended not to use sapply on matrix as matrix is a vector with dim attributes and the unit element is a single element (in data.frame, the unit is a column - so if we use sapply/lapply, it loops through columns). Here, it loops through element of the matrix. So, it may not be efficient. For matrix, apply with MARGINcan be used
df[] <- apply(df, 2, FUN = function(x) ifelse(x > 0, "sell", "buy"))
I have extracted the array indeces of some elements I want to look at as follows:
mat = matrix(0,10,10)
arrInd = which(mat ==0,arr.ind = T)
Then I do some more operations on this matrix and eventually end up with a vector or rows rowInd and a vector of columns colInd. I want us these indeces to insert values into another matrix, say mat2. But I can't seem to figure out a way to do this without looping or doing the modular arithmetic calculation myself. I realize I could take something like
mat2[rowInd*(colInd-1)+rowInd]
In order to transform back to the 1-d indexing. But since R usually has built in functions to do this sort of thing, I was wondering if there is any more concise way to do this? It would just seem natural that such a handy data-manipulation function like which(,arr.ind=T) would have a handy inverse.
I also tried using mat2[rowInd,colInd], but this did not work.
Have a read on R intro: indexing a matrix on the use of matrix indexing. which(, arr.ind = TRUE) returns a two column matrix suitable for direct use of matrix indexing. For example:
A <- matrix(c(1L,2L,2L,1L), 2)
iv <- which(A == 1L, arr.ind = TRUE)
# row col
#[1,] 1 1
#[2,] 2 2
A[iv]
# [1] 1 1
If you have another matrix B which you want to update values according to iv, just do
B[iv] <- replacement
Maybe for some reason you've separated row index and column index into rowInd and colInd. In that case, just use
cbind(rowInd, colInd)
as indexing matrix.
I have a matrix and I want to create a list with selected rows of that matrix being the list elements.
For example this is my matrix
my.matrix=matrix(1:100, nrow=20)
and I want to create a list from this matrix such a way that each element of this list is part of the matrix and the row index of each part is defined by
my.n=c(1,2,4,3,5,5)
where my.n gives the number of rows that should be extracted from my.matrix. my.n[1]=1 means row 1; my.n[2]=2 means row 2,3; my.n[3]=4 means rows 4 to 7 and so on.
So the first element of my list should be
my.matrix[1,]
second
my.matrix[2:3,]
and so on.
How to do it in an elegant way?
Not quite sure, but I think you want something like this ...
S <- split(seq_len(nrow(my.matrix)), rep.int(seq_along(my.n), my.n))
lapply(S, function(x) my.matrix[x, , drop = FALSE])
Here we are splitting the row numbers of my.matrix by replications of my.n. Then we use lapply() over the resulting list S to subset my.matrix with those row numbers.
end <- cumsum(my.n)
start <- c(1,(end+1)[-length(end)])
mapply(function(a,b) my.matrix[a:b,,drop=F], start, end)
mapply takes the first argument from two vectors and applies them to a function. It moves on to the second element of each vector and continues through each vector. This behavior works for this application to create a list of subsets as described. credit to #nongkrong for the mapply approach.
I'm stuck with a simple loop that takes more than an hour to run, and need help to speed it up.
Basically, I have a matrix with 31 columns and 400 000 rows. The first 30 columns have values, and the 31st column has a column-number. I need to, per row, retrieve the value in the column indicated by the 31st column.
Example row: [26,354,72,5987..,461,3] (this means that the value in column 3 is sought after (72))
The too slow loop looks like this:
a <- rep(0,nrow(data)) #To pre-allocate memory
for (i in 1:nrow(data)) {
a[i] <- data[i,data[i,31]]
}
I would think this would work:
a <- data[,data[,31]]
... but it results in "Error: cannot allocate vector of size 2.8 Mb".
I fear that this is a really simple question, so I've spent hours trying to understand apply, lapply, reshape, and more, but somehow I can't get a grip on the vectorization concept in R.
The matrix actually has even more columns that also go into the a-parameter, which is why I don't want to rebuild the matrix, or split it.
Your support is highly appreciated!
Chris
t(data[,1:30])[30*(0:399999)+data[,31]]
This works because you can reference matricies both in array format, and vector format (a 400000*31 long vector in this case) counting column-wise first. To count row-wise, you use the transpose.
Singe-index notation for the matrix may use less memory. This would involve doing something like:
i <- nrow(data)*(data[,31]-1) + 1:nrow(data)
a <- data[i]
Below is an example of single-index notation for matrices in R. In this example, the index of the per-row maximum is appended as the last column of a random matrix. This last column is then used to select the per-row maxima via single-index notation.
## create a random (10 x 5) matrix
M <- matrix(rpois(50,50),10,5)
## use the last column to index the maximum value of the first 5
## columns
MM <- cbind(M,apply(M,1,which.max))
## column ID row ID
i <- nrow(MM)*(MM[,ncol(MM)]-1) + 1:nrow(MM)
all(MM[i] == apply(M,1,max))
Using an index matrix is an alternative that will probably use more memory but is slightly clearer:
ii <- cbind(1:nrow(MM),MM[,ncol(MM)])
all(MM[ii] == apply(M,1,max))
Try to change the code to work a column at a time:
M <- matrix(rpois(30*400000,50),400000,30)
MM <- cbind(M,apply(M,1,which.max))
a <- rep(0,nrow(MM))
for (i in 1:(ncol(MM)-1)) {
a[MM[, ncol(MM)] == i] <- MM[MM[, ncol(MM)] == i, i]
}
This sets all elements in a with the values from column i if the last column has value i. It took longer to build the matrix than to calculate vector a.