Use values from 2 matrices to index a third in R - r

I'm optimizing a more complex code, but got stuck with this problem.
a<-array(sample(c(1:10),100,replace=TRUE),c(10,10))
m<-array(sample(c(1:10),100,replace=TRUE),c(10,10))
f<-array(sample(c(1:10),100,replace=TRUE),c(10,10))
g<-array(NA,c(10,10))
I need to use the values in a & m to index f and assign the value from f to g
i.e. g[1,1]<-f[a[1,1],m[1,1]] except for all the indexes, and as optimally/fast as possible
I could obviously make a for loop to do this for me but that seems rather dumb and slow. It seems like I should be able to us something in the apply family, but I've had no luck with figuring out how to do that. I do need to keep the data structured as it is here so that I can use matrix operations in different parts of my code. I've been searching for an answer to this but haven't found anything particularly helpful yet.

g[] <- f[cbind(c(a), c(m))]
This takes advantage of the fact that matrices can be addressed as vectors and using a matrix as the index.

Related

Replace for loop with vectorized call of a function returning multiple values

I have the following function: problema_firma_emprestimo(r,w,r_emprestimo,posicao,posicao_banco), where all input are scalars.
This function return three different matrix, using
return demanda_k_emprestimo,demanda_l_emprestimo,lucro_emprestimo
I need to run this function for a series of values of posicao_banco that are stored in a vector.
I'm doing this using a for loop, because I need three separate matrix with each of them storing one of the three outputs of the function, and the first dimension of each matrix corresponds to the index of posicao_banco. My code for this part is:
demanda_k_emprestimo = zeros(num_bancos,na,ny);
demanda_l_emprestimo = similar(demanda_k_emprestimo);
lucro_emprestimo = similar(demanda_k_emprestimo);
for i in eachindex(posicao_bancos)
demanda_k_emprestimo[i,:,:] , demanda_l_emprestimo[i,:,:] , lucro_emprestimo[i,:,:] = problema_firma_emprestimo(r,w,r_emprestimo[i],posicao,posicao_bancos[i]);
end
Is there a fast and clean way of doing this using vectorized functions? Something like problema_firma_emprestimo.(r,w,r_emprestimo[i],posicao,posicao_bancos) ? When I do this, I got a tuple with the result, but I can't find a good way of unpacking the answer.
Thanks!
Unfortunately, it's not easy to use broadcasting here, since then you will end up with output that is an array of tuples, instead of a tuple of arrays. I think a loop is a very good approach, and has no performance penalty compared to broadcasting.
I would suggest, however, that you organize your output array dimensions differently, so that i indexes into the last dimension instead of the first:
for i in eachindex(posicao_bancos)
demanda_k_emprestimo[:, :, i] , ...
end
This is because Julia arrays are column major, and this way the output values are filled into the output arrays in the most efficient way. You could also consider making the output arrays into vectors of matrices, instead of 3D arrays.
On a side note: since you are (or should be) creating an MWE for the sake of the people answering, it would be better if you used shorter and less confusing variable names. In particular for people who don't understand Portuguese (I'm guessing), your variable names are super long, confusing and make the code visually dense. Telling the difference between demanda_k_emprestimo and demanda_l_emprestimo at a glance is hard. The meaning of the variables are not important either, so it's better to just call them A and B or X and Y, and the functions foo or something.

Am I using the most efficient (or right) R instructions?

first question, I'll try to go straight to the point.
I'm currently working with tables and I've chosen R because it has no limit with dataframe sizes and can perform several operations over the data within the tables. I am happy with that, as I can manipulate it at my will, merges, concats and row and column manipulation works fine; but I recently had to run a loop with 0.00001 sec/instruction over a 6 Mill table row and it took over an hour.
Maybe the approach of R was wrong to begin with, and I've tried to look for the most efficient ways to run some operations (using list assignments instead of c(list,new_element)) but, since as far as I can tell, this is not something that you can optimize with some sort of algorithm like graphs or heaps (is just tables, you have to iterate through it all) I was wondering if there might be some other instructions or other basic ways to work with tables that I don't know (assign, extract...) that take less time, or configuration over RStudio to improve performance.
This is the loop, just so if it helps to understand the question:
my_list <- vector("list",nrow(table[,"Date_of_count"]))
for(i in 1:nrow(table[,"Date_of_count"])){
my_list[[i]] <- format(as.POSIXct(strptime(table[i,"Date_of_count"]%>%pull(1),"%Y-%m-%d")),format = "%Y-%m-%d")
}
The table, as aforementioned, has over 6 Mill rows and 25 variables. I want the list to be filled to append it to the table as a column once finished.
Please let me know if it lacks specificity or concretion, or if it just does not belong here.
In order to improve performance (and properly work with R and tables), the answer was a mixture of the first comments:
use vectors
avoid repeated conversions
if possible, avoid loops and apply functions directly over list/vector
I just converted the table (which, realized, had some tibbles inside) into a dataframe and followed the aforementioned keys.
df <- as.data.frame(table)
In this case, by doing this the dates were converted directly to character so I did not have to apply any more conversions.
New execution time over 6 Mill rows: 25.25 sec.

Best combination of lists that provides more unique values

Not sure if someone could help me with this problem.
I have 5 lists of values of different lengths.
Note: Same value can be presence in different lists.
Does anyone know how to get the combination of 3 lists that will provide more total unique values?
Thanks in advance,
Miguel
I do not really have an answer to your question, which seems to be more of a combinatorics question than programming. My sense is that if you want an exact solution you will have to try all the possible combinations of subsets of 3 lists out of 5 (there are 10 of them). One thing to remember if you go that way is that if you want the number of unique elements of the concatenation of 3 lists you do not have necessarily to do length(unique(c(l1,l2,l3)) which I imagine could be inefficient if you have very long lists. You can use the formula for the size of the intersection of 3 sets, which you can find for example at https://math.stackexchange.com/questions/669249/probability-of-the-union-of-3-events .
This will require you only to compute the length of all the possible intersections of the lists. it could be a completely academic exercise: as I said, I am not offering an answer but if you are not familiar with that formula it is worth reading it, since it is relevant to the problem of finding the size of a set.

Memory & Computation Efficient Creation of Array with Repeated Elements

I am trying to find an efficient way to create a new array by repeating each element of an old array a different, specified number of times. I have come up with something that works, using array comprehensions, but it is not very efficient, either in memory or in computation:
LENGTH = 1e6
A = collect(1:LENGTH) ## arbitrary values that will be repeated specified numbers of times
NumRepeats = [rand(20:100) for idx = 1:LENGTH] ## arbitrary numbers of times to repeat each value in A
B = vcat([ [A[idx] for n = 1:NumRepeats[idx]] for idx = 1:length(A) ]...)
Ideally, what I would like would be a structure akin to the sparse matrix apparatus that Julia has but that would instead store data efficiently based on the indices where repeated values occur. Barring that, I would at least like an efficient way to create a vector such as B in the example above. I looked into the repeat() function, but as far as I can tell from the documentation and my experimentation with the function, it is just for repeating slices of an array the same number of times for each slice. What is the best way to approach this?
Sounds like you're looking for run-length encoding. There's an RLEVectors.jl package here: https://github.com/phaverty/RLEVectors.jl. Not sure how usable it is. You could also make your own data type fairly easily.
Thanks for trying RLEVectors.jl. Some features and optimizations had been languishing on master without a version bump. It can definitely be mixed with other vectors for element-wise arithmetic. I'll put the linear algebra operations on the feature request list. Any additional feature suggestions would be most welcome.
RLEVectors.jl has a rep function that works like R's and RLEVectors.inverse_ree is like StatsBase.inverse_rle, but it works on run ends rather than lengths.

Sapply (from R) equivalent for Julia?

Suppose I have an 2 dimensional array and I want to apply several functions to each of its columns. Ideally I would like to get the results back in the form of a matrix (with one row per function, and one column per input column).
The following code generates the values I want, but as an Array of Arrays.
A = rand(10,10)
[mapslices(f, A, 1) for f in [mean median iqr]]
Another similar example is here [Julia: use of pmap with matrices
Is there a better syntax for getting the results back in the form of a 2 dimensional array, instead of an array of arrays?
What I'd really like is something with a functionality similar to sapply from R. [https://stat.ethz.ch/R-manual/R-devel/library/base/html/lapply.html]
You can use an anonymous function as in
mapslices(t -> [mean(t), median(t), iqr(t)], A, 1)
but using comprehensions and splatting, as in your last example, is also fine. For very large arrays, you might want to avoid the temporary allocations introduced by transpose and splatting, but in most cases you don't have to pay attention to that.
After playing around a bit I found one option, but I am still interested in hearing if there are any better ways of doing it.
[[mapslices(f, A, 1)' for f in [mean median iqr]]...]

Resources