Suppose I have an 2 dimensional array and I want to apply several functions to each of its columns. Ideally I would like to get the results back in the form of a matrix (with one row per function, and one column per input column).
The following code generates the values I want, but as an Array of Arrays.
A = rand(10,10)
[mapslices(f, A, 1) for f in [mean median iqr]]
Another similar example is here [Julia: use of pmap with matrices
Is there a better syntax for getting the results back in the form of a 2 dimensional array, instead of an array of arrays?
What I'd really like is something with a functionality similar to sapply from R. [https://stat.ethz.ch/R-manual/R-devel/library/base/html/lapply.html]
You can use an anonymous function as in
mapslices(t -> [mean(t), median(t), iqr(t)], A, 1)
but using comprehensions and splatting, as in your last example, is also fine. For very large arrays, you might want to avoid the temporary allocations introduced by transpose and splatting, but in most cases you don't have to pay attention to that.
After playing around a bit I found one option, but I am still interested in hearing if there are any better ways of doing it.
[[mapslices(f, A, 1)' for f in [mean median iqr]]...]
Related
I have the following function: problema_firma_emprestimo(r,w,r_emprestimo,posicao,posicao_banco), where all input are scalars.
This function return three different matrix, using
return demanda_k_emprestimo,demanda_l_emprestimo,lucro_emprestimo
I need to run this function for a series of values of posicao_banco that are stored in a vector.
I'm doing this using a for loop, because I need three separate matrix with each of them storing one of the three outputs of the function, and the first dimension of each matrix corresponds to the index of posicao_banco. My code for this part is:
demanda_k_emprestimo = zeros(num_bancos,na,ny);
demanda_l_emprestimo = similar(demanda_k_emprestimo);
lucro_emprestimo = similar(demanda_k_emprestimo);
for i in eachindex(posicao_bancos)
demanda_k_emprestimo[i,:,:] , demanda_l_emprestimo[i,:,:] , lucro_emprestimo[i,:,:] = problema_firma_emprestimo(r,w,r_emprestimo[i],posicao,posicao_bancos[i]);
end
Is there a fast and clean way of doing this using vectorized functions? Something like problema_firma_emprestimo.(r,w,r_emprestimo[i],posicao,posicao_bancos) ? When I do this, I got a tuple with the result, but I can't find a good way of unpacking the answer.
Thanks!
Unfortunately, it's not easy to use broadcasting here, since then you will end up with output that is an array of tuples, instead of a tuple of arrays. I think a loop is a very good approach, and has no performance penalty compared to broadcasting.
I would suggest, however, that you organize your output array dimensions differently, so that i indexes into the last dimension instead of the first:
for i in eachindex(posicao_bancos)
demanda_k_emprestimo[:, :, i] , ...
end
This is because Julia arrays are column major, and this way the output values are filled into the output arrays in the most efficient way. You could also consider making the output arrays into vectors of matrices, instead of 3D arrays.
On a side note: since you are (or should be) creating an MWE for the sake of the people answering, it would be better if you used shorter and less confusing variable names. In particular for people who don't understand Portuguese (I'm guessing), your variable names are super long, confusing and make the code visually dense. Telling the difference between demanda_k_emprestimo and demanda_l_emprestimo at a glance is hard. The meaning of the variables are not important either, so it's better to just call them A and B or X and Y, and the functions foo or something.
I am trying to find an efficient way to create a new array by repeating each element of an old array a different, specified number of times. I have come up with something that works, using array comprehensions, but it is not very efficient, either in memory or in computation:
LENGTH = 1e6
A = collect(1:LENGTH) ## arbitrary values that will be repeated specified numbers of times
NumRepeats = [rand(20:100) for idx = 1:LENGTH] ## arbitrary numbers of times to repeat each value in A
B = vcat([ [A[idx] for n = 1:NumRepeats[idx]] for idx = 1:length(A) ]...)
Ideally, what I would like would be a structure akin to the sparse matrix apparatus that Julia has but that would instead store data efficiently based on the indices where repeated values occur. Barring that, I would at least like an efficient way to create a vector such as B in the example above. I looked into the repeat() function, but as far as I can tell from the documentation and my experimentation with the function, it is just for repeating slices of an array the same number of times for each slice. What is the best way to approach this?
Sounds like you're looking for run-length encoding. There's an RLEVectors.jl package here: https://github.com/phaverty/RLEVectors.jl. Not sure how usable it is. You could also make your own data type fairly easily.
Thanks for trying RLEVectors.jl. Some features and optimizations had been languishing on master without a version bump. It can definitely be mixed with other vectors for element-wise arithmetic. I'll put the linear algebra operations on the feature request list. Any additional feature suggestions would be most welcome.
RLEVectors.jl has a rep function that works like R's and RLEVectors.inverse_ree is like StatsBase.inverse_rle, but it works on run ends rather than lengths.
I'm optimizing a more complex code, but got stuck with this problem.
a<-array(sample(c(1:10),100,replace=TRUE),c(10,10))
m<-array(sample(c(1:10),100,replace=TRUE),c(10,10))
f<-array(sample(c(1:10),100,replace=TRUE),c(10,10))
g<-array(NA,c(10,10))
I need to use the values in a & m to index f and assign the value from f to g
i.e. g[1,1]<-f[a[1,1],m[1,1]] except for all the indexes, and as optimally/fast as possible
I could obviously make a for loop to do this for me but that seems rather dumb and slow. It seems like I should be able to us something in the apply family, but I've had no luck with figuring out how to do that. I do need to keep the data structured as it is here so that I can use matrix operations in different parts of my code. I've been searching for an answer to this but haven't found anything particularly helpful yet.
g[] <- f[cbind(c(a), c(m))]
This takes advantage of the fact that matrices can be addressed as vectors and using a matrix as the index.
There seems to be general agreement that the l in "lapply" stands for list, the s in "sapply" stands for simplify and the r in "rapply" stands for recursively. But I could not find anything on the t in "tapply". I am now very curious.
Stands for table since tapply is the generic form of the table function. You can see this by comparing the following calls:
x <- sample(letters, 100, rep=T)
table(x)
tapply(x, x, length)
although obviously tapply can do more than counting.
Also, some references that refer to "table-apply":
R and S Plus companion
Modern Applied Biostatistical Methods
I think of it as 'table'-apply since the result comes as a matrix/table/array and its dimensions are established by the INDEX arguments. An R table-classed object is really very similar in contrcution and behavior to an R matrix or array. The application is being performed in a manner similar to that of ave. Groups are first assembled on the basis of the "factorized" INDEX argument list (possibly with multiple dimensions) and a matrix or array is returned with the results of the FUN applied to each cross-classified grouping.
The other somewhat similar function is 'xtabs'. I keep thinking it should have a "FUN" argument, but what I'm probably forgetting at that point is really tapply.
tapply is sort of the odd man out. As far as I know, and as far as the R documentation for the apply functions goes, the 't' does not stand for anything, unlike the other apply functions which indicate the input or output options.
Ok, I'm stuck in a dumbness loop. I've read thru the helpful ideas at How to sort a dataframe by column(s)? , but need one more hint. I'd like a function that takes a matrix with an arbitrary number of columns, and sorts by all columns in sequence. E.g., for a matrix foo with N columns,
does the equivalent of foo[order(foo[,1],foo[,2],...foo[,N]),] . I am happy to use a with or by construction, and if necessary define the colnames of my matrix, but I can't figure out how to automate the collection of arguments to order (or to with) .
Or, I should say, I could build the entire bloody string with paste and then call it, but I'm sure there's a more straightforward way.
The most elegant (for certain values of "elegant") way would be to turn it into a data frame, and use do.call:
foo[do.call(order, as.data.frame(foo)), ]
This works because a data frame is just a list of variables with some associated attributes, and can be passed to functions expecting a list.