Assignment of value in R - r

How can I improve the speed of following codes?
for (i in 1:nrow(training)){
score[training[i,1],training[i,2],training[i,4]] = training[i,3]
}
Training is a matrix with four columns. I just want to build an array which the value is training[i,3] according the formula above.
Thanks!

You can index using a matrix. Here is the relevant part of ['s documentation:
A third form of indexing is via a numeric matrix with the one
column for each dimension: each row of the index matrix then
selects a single element of the array, and the result is a vector.
So in your case, the for loop can be replaced with:
score[training[, c(1, 2, 4)]] <- training[, 3]

Related

R create list or matrix

If I repeat this code
x<-1:6
n<-40
M<-200
y<-replicate(M,as.numeric(table(sample(x,n,1))))
str(y)
sometimes R decide to create a matrix and sometimes it creates a list. Can you explain me the reason for that? How can I be sure that it is a matrix or a list?
If you chose M very small, for example 10, it will almost always create a matrix. If you chose M very large, for example 2000, it will create a list.
You get a list for cases when not all the numbers in x are sampled.
You can always return a list by using simplify = FALSE.
y <- replicate(M, as.numeric(table(sample(x,n,TRUE))), simplify = FALSE)
Also, you are using 1 to set replace argument. It is better to use logical argument i.e TRUE.
To return always a matrix, we can do :
sapply(y, `[`, x)
This will append NA's for values where length is unequal.
May be it will help
[https://rafalab.github.io/dsbook/r-basics.html#data-types][1]
Vectors in matrix have to be all the same type and length
Vectors in list can contain elements of different classes and length
Try this:
x<-1
y<-2:7
z<-matrix(x,y)
z<-list(x,y)
In first case you will get matrix 2 rows and 1 column because y vector is longer
In the second case you will get a list with elements of different length.
Also
str()
function is very useful. But you can find the class of object using
class()
function.

R; Populating a matrix with a for loop iterating over a vector

I am not very experienced with R, and have been struggling for days to repeat a string of code to fill a data matrix. My instinct is to create a for loop.
I am a biology student working on colour differences between sets of images, making use of the R package colordistance. The relevant data has been loaded in R as a list of 8x4 matrices (each matrix describes the colours in one image). Five images make up one set and there are 100 sets in total. Each set is identified by a number (not 1-100, it's an interrupted sequence, but I have stored the sequence of numbers in a vector called 'numberlist'). I have written the code to extract the desired data in the right format for the first set, and it is as follows;
## extract the list of matrices belonging to the first set (A3) from the the full list
A3<-histlist[grep('^3',names(histlist))]
## create a colour distance matrix (cdm), ie a pairwise comparison of "similarity" between the five matrices stored in A3
cdm3<-colordistance::getColorDistanceMatrix(A3, method="emd", plotting=FALSE)
## convert to data frame to fix row names
cdm3df<-as.data.frame(cdm3)
## remove column names
names(cdm3df)<-NULL
## return elements in the first row and column 2-5 only (retains row names).
cdm3filtered<-cdm3df[1,2:5]
Now I want to replace "3" in the code above with each number in 'numberlist' (not sure whether they should be as.factor or as.numeric). I've had many attempts starting with for (i in numberlist) {...} but with no successful output. To me it makes sense to store the output from the loop in a storage matrix; matrix(nrow=100,ncol=4) but I am very much stuck, and unable to populate my storage matrix row by row by iterating the code above...
Any help would be greatly appreciated!
Updates
What I want the outputs of the loop to to look like (+ appended in the storage matrix);
> cdm17filtered
17clr 0.09246918 0.1176651 0.1220622 0.1323586
This is my attempt:
for (i in numberlist$X) {
A[i] <- histlist[grep(paste0('^',i),names(histlist))]
cdm[i] <- colordistance::getColorDistanceMatrix(A[i], method="emd", plotting=FALSE)
cdm[i]df <- as.data.frame(cdm[i])
cdm[i]filtered <- cdm[i]df[1,2:5]
print(A[i]) # *insert in n'th column of storage matrix
}
The above is not working, and I'm missing the last bit needed to store the outputs of the loop in the storage matrix. (I was advised against using rbind to populate the storage matrix because it is slow..)
In your attempt, you use invalid R names with non-alphanumeric characters not escaped, cdm[i]df and cdm[i]filtered. It seems you intend to index from a larger container like a list of objects.
To properly generalize your process for all items in numberlist, adjust your ^3 setup. Specifically, build empty lists and in loop iteratively assign by index [i]:
# INITIALIZE LISTS (SAME LENGTH AS numberlist)
A <- vector(mode="list", length = length(numberlist))
cdm_matrices <- vector(mode="list", length = length(numberlist))
cdm_dfs <- vector(mode="list", length = length(numberlist))
cdm_filtered_dfs <- vector(mode="list", length = length(numberlist))
# POPULATE LISTS
for (i in numberlist$X) {
## extract the list of matrices belonging to the first set
A[i] <- histlist[grep(paste0('^', i), names(histlist))]
## create a colour distance matrix (cdm)
cdm_matrices[i] <- colordistance::getColorDistanceMatrix(A[i], method="emd", plotting=FALSE)
## convert to data frame to fix row names and remove column names
cdm_dfs[i] <- setNames(as.data.frame(cdm_matrices[i]), NULL)
## return elements in the first row and column 2-5 only (retains row names).
cdm_filtered_dfs[i] <- cdm_dfs[i][1,2:5]
}
Alternatively, if you only need the last object, cdm_filtered_df returned, use lapply where you do not need to use or index lists and all objects are local in scope of function (i.e., never saved in global environment):
cdm_build <- function(i) {
A <- histlist[grep(paste0('^', i), names(histlist))]
cdm <- colordistance::getColorDistanceMatrix(A, method="emd", plotting=FALSE)
cdm_df <- setNames(as.data.frame(cdm), NULL)
cdm_filtered_df <- cdm_df[1,2:5]
return(cdm_filtered_df) # REDUNDANT AS LAST LINE IS RETURNED BY DEFAULT
}
# LIST OF FILTERED CDM DATA FRAMES
cdm_filtered_dfs <- lapply(numberlist, cdm_build)
Finally, with either solution above, should you want to build a singular data frame, run rbind in a do.call():
cdm_final_df <- do.call(rbind, cdm_filtered_dfs)

R_List from a selected rows of matrix

I have a matrix and I want to create a list with selected rows of that matrix being the list elements.
For example this is my matrix
my.matrix=matrix(1:100, nrow=20)
and I want to create a list from this matrix such a way that each element of this list is part of the matrix and the row index of each part is defined by
my.n=c(1,2,4,3,5,5)
where my.n gives the number of rows that should be extracted from my.matrix. my.n[1]=1 means row 1; my.n[2]=2 means row 2,3; my.n[3]=4 means rows 4 to 7 and so on.
So the first element of my list should be
my.matrix[1,]
second
my.matrix[2:3,]
and so on.
How to do it in an elegant way?
Not quite sure, but I think you want something like this ...
S <- split(seq_len(nrow(my.matrix)), rep.int(seq_along(my.n), my.n))
lapply(S, function(x) my.matrix[x, , drop = FALSE])
Here we are splitting the row numbers of my.matrix by replications of my.n. Then we use lapply() over the resulting list S to subset my.matrix with those row numbers.
end <- cumsum(my.n)
start <- c(1,(end+1)[-length(end)])
mapply(function(a,b) my.matrix[a:b,,drop=F], start, end)
mapply takes the first argument from two vectors and applies them to a function. It moves on to the second element of each vector and continues through each vector. This behavior works for this application to create a list of subsets as described. credit to #nongkrong for the mapply approach.

R array indexing for multi-dimensional arrays

I have a simple array indexing question for multi-dimensional arrays in R. I am doing a lot of simulations that each give a result as a matrix, where the entries are classified into categories. So for example a result looks like
aresult<-array(sample(1:3, 6, replace=T), dim=c(2,5),
dimnames=list(
c("prey1", "prey2"),
c("predator1", "predator2", "predator3", "predator4", "predator5")))
Now I want to store the results of my experiments in a 3D-matrix, where the first two dimension are the same as in aresult and the third dimension holds the number of experiments that fell into each category. So my arrays of counts should look like
Counts<-array(0, dim=c(2, 5, 3),
dimnames=list(
c("prey1", "prey2"),
c("predator1", "predator2", "predator3", "predator4", "predator5"),
c("n1", "n2", "n3")))
and after each experiment I want to increment the numbers in the third dimension by 1, using the values in aresults as indexes.
How can I do that without using loops?
This sounds like a typical job for matrix indexing. By subsetting Counts with a three column matrix, each row specifying the indices of an element we want to extract, we are free to extract and increment any elements we like.
# Create a map of all combinations of indices in the first two dimensions
i <- expand.grid(prey=1:2, predator=1:5)
# Add the indices of the third dimension
i <- as.matrix( cbind(i, as.vector(aresult)) )
# Extract and increment
Counts[i] <- Counts[i] + 1

Return value from column indicated in same row

I'm stuck with a simple loop that takes more than an hour to run, and need help to speed it up.
Basically, I have a matrix with 31 columns and 400 000 rows. The first 30 columns have values, and the 31st column has a column-number. I need to, per row, retrieve the value in the column indicated by the 31st column.
Example row: [26,354,72,5987..,461,3] (this means that the value in column 3 is sought after (72))
The too slow loop looks like this:
a <- rep(0,nrow(data)) #To pre-allocate memory
for (i in 1:nrow(data)) {
a[i] <- data[i,data[i,31]]
}
I would think this would work:
a <- data[,data[,31]]
... but it results in "Error: cannot allocate vector of size 2.8 Mb".
I fear that this is a really simple question, so I've spent hours trying to understand apply, lapply, reshape, and more, but somehow I can't get a grip on the vectorization concept in R.
The matrix actually has even more columns that also go into the a-parameter, which is why I don't want to rebuild the matrix, or split it.
Your support is highly appreciated!
Chris
t(data[,1:30])[30*(0:399999)+data[,31]]
This works because you can reference matricies both in array format, and vector format (a 400000*31 long vector in this case) counting column-wise first. To count row-wise, you use the transpose.
Singe-index notation for the matrix may use less memory. This would involve doing something like:
i <- nrow(data)*(data[,31]-1) + 1:nrow(data)
a <- data[i]
Below is an example of single-index notation for matrices in R. In this example, the index of the per-row maximum is appended as the last column of a random matrix. This last column is then used to select the per-row maxima via single-index notation.
## create a random (10 x 5) matrix
M <- matrix(rpois(50,50),10,5)
## use the last column to index the maximum value of the first 5
## columns
MM <- cbind(M,apply(M,1,which.max))
## column ID row ID
i <- nrow(MM)*(MM[,ncol(MM)]-1) + 1:nrow(MM)
all(MM[i] == apply(M,1,max))
Using an index matrix is an alternative that will probably use more memory but is slightly clearer:
ii <- cbind(1:nrow(MM),MM[,ncol(MM)])
all(MM[ii] == apply(M,1,max))
Try to change the code to work a column at a time:
M <- matrix(rpois(30*400000,50),400000,30)
MM <- cbind(M,apply(M,1,which.max))
a <- rep(0,nrow(MM))
for (i in 1:(ncol(MM)-1)) {
a[MM[, ncol(MM)] == i] <- MM[MM[, ncol(MM)] == i, i]
}
This sets all elements in a with the values from column i if the last column has value i. It took longer to build the matrix than to calculate vector a.

Resources