recycling higher dimensional arrays - r

I was surprised to find that R's recycling didn't apply in higher dimensions:
> str(Z)
num [1:5, 1:100, 1:10] 1.02 0.989 2.555 1.167 -0.835 ...
> str(w)
num [1:5, 1:100] 1.43 7.84 6.13 2.91 2.8 ...
> Z + w
Error in Z + w : non-conformable arrays
whereas I expected the 2d matrix w to be recycled along the 3rd dimension of Z. I get the same error with a matrix w with dimensions like the last 2 of Z (as with numpy's broadcasting rule). I figured when recycling R would simply flatten each array in the order of the dimensions (C style) and add them, then reshape them back, which would work in however many dimensions. Is there a right way to recycle a matrix like I'm trying to? I guess I could do the flattening and reshaping myself by manipulating the dim attributes, but obviously would prefer not to do the work myself.
The language definition has this line: "That is, if for instance you add c(1, 2, 3) to a six-element vector then you will really add c(1, 2, 3, 1, 2, 3)." Can anyone who has looked under the hood tell me whether R is literally creating a new longer vector from the shorter, to conform the the other operand, and then applying the operator? I had been assuming recycling was more space-efficient. If not then I might as well achieve the higher-dimensional recycling by creating a 3-way array from the matrix. I imagine there is some package for multiway arrays/tensors but I would prefer to use base.

Implicit recycling only works with vectors. A solution to matrix recycling is to use the sweep function, as documented here. In your case, try
sweep(Z,1:2,w,FUN="+")
The second argument specifies which dimensions of Z will be preserved.

Related

Julia: Turn Vector into multiple m x n matrices without a loop

Let's say I have a vector V, and I want to either turn this vector into multiple m x n matrices, or get multiple m x n matrices from this Vector V.
For the most basic example: Turn V = collect(1:75) into 3 5x5 matrices.
As far as I am aware this can be done by first using reshape reshape(V, 5, :) and then looping through it. Is there a better way in Julia without using a loop?
If possible, a solution that can easily change between row-major and column-major results is preferrable.
TL:DR
m, n, n_matrices = 4, 2, 5
V = collect(1:m*n*n_matrices)
V = reshape(V, m, n, :)
V = permutedims(V, [2,1,3])
display(V)
From my limited knowledge about Julia:
When doing V = collect(1:m*n), you initialize a contiguous array in memory. From V you wish to create a container of m by n matrices. You can achieve this by doing reshape(V, m, n, :), then you can access the first matrix with V[:,:,1]. The "container" in this case is just another array (thus you have a three dimensional array), which in this case we interpret as "an array of matrices" (but you could also interpret it as a box). You can then transpose every matrix in your array by swapping the first two dimensions like this: permutedims(V, [2,1,3]).
How this works
From what I understand; n-dimensional arrays in Julia are contiguous arrays in memory when you don't do any "skipping" (e.g. V[1:2:end]). For example the 2 x 4 matrix A:
1 3 5 7
2 4 6 8
is in memory just 1 2 3 4 5 6 7 8. You simply interpret the data in a specific way, where the first two numbers makes up the first column, then the second two numbers makes the next column so on so forth. The reshape function simply specifies how you want to interpret the data in memory. So if we did reshape(A, 4, 2) we basically interpret the numbers in memory as "the first four values makes the first column, the second four values makes the second column", and we would get:
1 5
2 6
3 7
4 8
We are basically doing the same thing here, but with an extra dimension.
From my observations it also seems to be that permutedims in this case reallocates memory. Also, feel free to correct me if I am wrong.
Old answer:
I don't know much about Julia, but in Python using NumPy I would have done something like this:
reshape(V, :, m, n)
EDIT: As #BatWannaBe states, the result is technically one array (but three dimensional). You can always interpret a three dimensional array as a container of 2D arrays, which from my understanding is what you ask for.

How can we do operations inside indexing operations in R?

For example, let's imagine following vector in R:
a <- 1:8; k <- 2
What I would like to do is getting for example all elements between 2k and 3k, namely:
interesting_elements <- a[2k:3k]
Erreur : unexpected symbol in "test[2k"
interesting_elements <- a[(2k):(3k)]
Erreur : unexpected symbol in "test[2k"
Unfortunately, indexing vectors in such a way in R does not work, and the only way I can do such an operation seems to create a specific variable k' storing result of 2k, and another k'' storing result of 3k.
Is there another way, without creating each time a new variable, for doing operations when indexing?
R does not interpret 2k as scalar multiplication as with other languages. You need to use explicit arithmetic operators.
If you are trying to access the 4 to 6 elements of a then you need to use * and parentheses:
a[(2*k):(3*k)]
[1] 4 5 6
If you leave off the parentheses then the sequence will evaluate first then the multiplication:
2*k:3*k
[1] 8 12
Is the same as
(k:3)*2*k
[1] 8 12

Identity matrix in Julia

I'm trying to construct the identity matrix in Julia 1.1. After looking at the documentation I found that I could compute a 4x4 Identity matrix as follows:
julia> Id4 =1* Matrix(I, 4, 4)
4×4 Array{Int64,2}:
1 0 0 0
0 1 0 0
0 0 1 0
0 0 0 1
Is this the most julianic way of coding it or is there a better/shorter way, as it is an often used matrix?
Given using LinearAlgebra, the most julianic way of expressing the identity matrix is:
I
This answer may seem trite, but it is also kind of profound. The whole point of the operator I is that in the vast majority of cases where users want an identity matrix, it is not necessary to actually instantiate that matrix.
Let's say you want a 1000x1000 identity matrix. Why waste time building the entire matrix, when you could just use I, noting that sizeof(I) evaluates to 1 (ie the size of the object is 1 byte). All functions in base Julia (including LinearAlgebra) understand what I is, and can use it appropriately without having to waste time building the actual matrix it represents first.
Now, it may be the case that for some reason you need to specify the type of the elements of your identity matrix. Note:
julia> I
UniformScaling{Bool}
true*I
so in this case, you are using a notional identity matrix with a diagonal of true and off-diagonal of false. This is sufficient in many cases, even if your other matrices are Int or Float64. Internally, Julia will use methods that specialize on the types. However, if you want to specify your identity matrix to contain integers or floats, use:
julia> 1I
UniformScaling{Int64}
1*I
julia> 1.0I
UniformScaling{Float64}
1.0*I
Note that sizeof(1I) evaluates to 8, indicating the notional Int64 types of the members of that matrix.
Also note that you can use e.g. 5I if you want a notional matrix with 5 on the diagonal and 0 elsewhere.
In some cases (and these cases are much rarer than many might think), you may need to actually build the matrix. In this case, you can use e.g.:
Matrix(1I, 3, 3) # Identity matrix of Int type
Matrix(1.0I, 3, 3) # Identity matrix of Float64 type
Matrix(I, 3, 3) # Identity matrix of Bool type
Bogumił has also pointed out in the comments that if you are uncomfortable with implying the type of the output in the first argument of the constructors above, you can also use the (slightly more verbose):
Matrix{Int}(I, 3, 3) # Identity matrix of Int type
Matrix{Float64}(I, 3, 3) # Identity matrix of Float64 type
Matrix{Bool}(I, 3, 3) # Identity matrix of Bool type
and specify the type explicitly.
But really, the only times you would probably need to do this are as follows:
When you want to input an identity matrix into a function in a package written in such a way that the input must be a concrete matrix type.
When you want to start out with an identity matrix but then mutate it in place into something else via one or several transformations.

How could a matrix hold different data types when coercing from a list?

As commonly known, a matrix can only hold one data type.
But it seems like I can coerce a list into a matrix just fine.
tmp <- matrix(list(1, "a"))
class(tmp)
[1] "matrix"
str(tmp)
List of 2
$ : num 1
$ : chr "a"
attr(*, "dim")= int [1:2] 2 1
Is this behavior documented?
(This is my first time answering so formatting tips and suggestions are welcome)
This is because its not a matrix of vectors, its a matrix of lists! To be more clear, the one 'type' matrix is holding is the data structure type called lists. And List are not one dimensional in R but n-dimensional! In R, this means that lists ability to handle different data types with ease allows other functions like matrix to break there own limits and handle multi-data types. In fact to paraphrase my professor, it is the precisely this n-dimensional power of R's version of lists is one of the top 10 reasons R handles Big Data better than java or python languages
As #nrussel pointed out thought the matrix() documenation only hints at this behavior and there is no exact documentation of matrix-list combinations; however, I will show a simple code will help clarify this behavior then show how supported by various websites. My sample code inspired from another question answered by 42-[1] in stack overflow:
tmp <- matrix(list(1, "a"))
str(tmp)
> List of 2
>
> $ : num 1
>
> $ : chr "a"
>
> attr(*, "dim")= int [1:2] 2 1
class(tmp)
"matrix"
is.matrix(tmp)
is.list(tmp)
is.array(tmp)
TRUE TRUE TRUE
tmp <- matrix(list(c(1,2), c("a","b"))
str(tmp)
List of 2
$ : num [1:2] 1 2
$ : chr [1:2] "a" "b"
attr(*, "dim")= int [1:2] 2 1
So what is going on here? Although beginner R tutorials[2] sometimes oversimplify list[3], lists in R are not like vectors and are not exactly 1 dimensional (1D). Unlike other languages, list are really 1 x (n x m) dimensional (1nD). (Aside: list are even sometimes be 1 x (n x m ... n+1), but I will explain this later). Like most languages, a list is collection of data structures.
So again what is going on here in the above example? Look at the output above for first str(tmp), is.matrix, and is.list. class() and is.matrix, tell us the overall function is a matrix. str() tells use however, that inside the maxtrix is a list. Str() tells use that each list its only a list of 1 value meaning its only a 1 x 1 list. So its 1 x 2 matrix of 1 x 1 lists. This is why is.list() gives the value TRUE because technically there are only lists in the matrix.
Now let's talk more about list as we look at the second example in the code or second str(tmp). Like most languages, a list is collection of data structures. The parts of a list can be simply vectors, like a data.frame, but with the columns are allowed to have a different lengths and data types. However unlike other languages, a list in R can also be a much more complicated structure (paraphrased from ramnathv[3]). Looking above, see that [1:2]? The [number:number] line tells use that each internal list in our second example is 1 x 2. However, str(tmp) is still telling us that our matrix is still only 1 x 2. This is because the matrix only lists as individuals, and matrix is behaving like 1 x 2 vector. Combining the list and matrix observations, overall the matrix lists has dimension of 1 x (2 x (1 x 2)). Its two 1 x 2's. This 1 x (2 x (1 x 2) format is shows what I meant by lists being 1 x (n x m) dimensional and what ramnathv meant by lists 'complicated structure'. Since behavior of combining lists with arrays might still not be clear lets delve a bit deeper than one normally might into this "complicated structure" of lists.
==== Deeper meaning of Lists being nD allow matrix of lists to have ====
Paul Murrell[6] when talking about lists complicated structure noted how:
In R lists act as containers. Unlike (regular) atomic vectors, the contents of a list are not restricted to a single mode and can encompass any mixture of data types. Lists are sometimes called recursive vectors, because a list can contain other lists. This makes them fundamentally different from (regular) atomic vectors.
By atomic vector, Paul is talking about what most other people call vectors. (In R regular numerical or character vectors are not called vectors but atomic vectors in order to distinguish them from generalized vectors like lists.) To illustrate what Paul Murrell means in the above quote lets look at another more complicated list. And then look at recursive vector lists when combined with matrices.
tmp1 <- matrix(list(c(1,2,3),c("a","b","c","d"),as.factor("soup")))
str(tmp1)
List of 2
$ : num [1:3] 1 2 3
$ : chr [1:4] "a" "b" "c" "d"
$ : Factor w/ 1 level "soup": 1
attr(*, "dim")= int [1:2] 2 1
Recursive Matrix Lists
tmp2 <- matrix(list(c(1,2,3),c("a","b","c"),list(c(1,2,3),c("a","b","c"))))
str(tmp2)
tmp3 <- matrix(list(c(1,2,3),c("a","b","c"),matrix(list(c(1,2,3),c("a","b","c")))))
str(tmp3)
I will leave the output for you to discover.
In the first example, we see I hope just different Lists in R can be from atomic vectors. In this list we see not only that lists can have different 1 x n sizes, and not only completely different types, but list can hold data structures (ex. factor). Factor believe it or not is not a data type but the most generalized form of vector that R has[4]; a vector that cares not for its data is homogenious in type[5]). I am going to have to wave my hands with the math of what str(tmp1) tell us, but overall matrix list is now (1 x 3*) or 1 x ((1 x 3) + (1 x 4) + (1 x 1)). Yet the matrix itself thinks its only a In this list of list example, list is allow the matrix to act like a true generalized table of structured data that is not the same by variable type, number of rows, or variable class. It is miles simpler then trying to create the same generalized structure in java or python. I hope you can see how it bring the conversation full circle to my point at the beginning....
The TLDR Answer: List in R are n dimensional not 1 dimensional! In R, lists ability to handle different data types with ease allows other functions like matrix to break there own limitations and handle multi-data types.
However, list can also contain a list of themselves as shown by the last two pieces of code. Literally I have posted two different ways a list could be used to pass in the same variables into itself as a new section of the list. If I were to draw what the data table would look like, this recurision of list is like Infinity Mirror because it has a constantly folding dimension (n + 1) or is 1 x (n x m ... n+1). I only bring this up to extend my answer to show how list-matrix has been used in R to allow a regular 2 x 2 matrix to represent an infinity matrix with only finite numbers (at least according to my professor that what he uses it for)
I hope this helped you understand lists better. Sites like http://adv-r.had.co.nz/Data-structures.html have a typo when they call a list simply just a vector. I hope this cleared up the confusion you had about lists being 1D data structures.
P.S. I could also use help with the formatting of stackoverflow. Its a bit overwhelming.
[1]: stackoverflow.com/questions/30007890/how-to-create-a-matrix-of-lists-in-r
[2]: en.wikibooks.org/wiki/R_Programming/Data_types#Lists
[3]: adv-r.had.co.nz/Data-structures.html
[4]: www.r-bloggers.com/data-types-part-3-factors/
[5]: www.tutorialspoint.com/r/r_data_types.htm
[6]: www.stat.auckland.ac.nz/~paul/ItDT/HTML/node64.html#SECTION001345000000000000000
[7]: https://ramnathv.github.io/pycon2014-r/learn/structures.html

r Error dim(X) must have a positive length?

I want to compute the mean of "Population" of built-in matrix state.x77. The codes are :
apply(state.x77[,"Population"],2,FUN=mean)
#Error in apply(state.x77[, "Population"], 2, FUN = mean) :
# dim(X) must have a positive length
how can I prevent this error? If I use $ sign
apply(state.x77$Population,2,mean)
# Error in state.x77$Population : $ operator is invalid for atomic vectors
What is atomic vector?
To expand on joran's comments, consider:
> is.vector(state.x77[,"Population"])
[1] TRUE
> is.matrix(state.x77[,"Population"])
[1] FALSE
So, your Population data is now no diferent from any other vector, like 1:10, which has neither columns or rows to apply against. It is just a series of numbers with no more advanced structure or dimension. E.g.
> apply(1:10,2,mean)
Error in apply(1:10, 2, mean) : dim(X) must have a positive length
Which means you can just use the mean function directly against the matrix subset which you have selected: E.g.:
> mean(1:10)
[1] 5.5
> mean(state.x77[,"Population"])
[1] 4246.42
To explain 'atomic' vector more, see the R FAQ again (and this gets a bit complex, so hold on to your hat)...
R has six basic (‘atomic’) vector types: logical, integer, real,
complex, string (or character) and raw.
http://cran.r-project.org/doc/manuals/r-release/R-lang.html#Vector-objects
So atomic in this instance is referring to vectors as the basic building blocks of R objects (like atoms make up everything in the real world).
If you read R's inline help by entering ?"$" as a command, you will find it says:
‘$’ is only valid for recursive objects, and is only
discussed in the section below on recursive objects.
Since vectors (like 1:10) are basic building blocks ("atomic"), with no recursive sub-elements, trying to use $ to access parts of them will not work.
Since your matrix (statex.77) is essentially just a vector with some dimensions, like:
> str(matrix(1:10,nrow=2))
int [1:2, 1:5] 1 2 3 4 5 6 7 8 9 10
...you also can't use $ to access sub-parts.
> state.x77$Population
Error in state.x77$Population : $ operator is invalid for atomic vectors
But you can access subparts using [ and names like so:
> state.x77[,"Population"]
Alabama Alaska Arizona...
3615 365 2212...

Resources