r Error dim(X) must have a positive length? - r

I want to compute the mean of "Population" of built-in matrix state.x77. The codes are :
apply(state.x77[,"Population"],2,FUN=mean)
#Error in apply(state.x77[, "Population"], 2, FUN = mean) :
# dim(X) must have a positive length
how can I prevent this error? If I use $ sign
apply(state.x77$Population,2,mean)
# Error in state.x77$Population : $ operator is invalid for atomic vectors
What is atomic vector?

To expand on joran's comments, consider:
> is.vector(state.x77[,"Population"])
[1] TRUE
> is.matrix(state.x77[,"Population"])
[1] FALSE
So, your Population data is now no diferent from any other vector, like 1:10, which has neither columns or rows to apply against. It is just a series of numbers with no more advanced structure or dimension. E.g.
> apply(1:10,2,mean)
Error in apply(1:10, 2, mean) : dim(X) must have a positive length
Which means you can just use the mean function directly against the matrix subset which you have selected: E.g.:
> mean(1:10)
[1] 5.5
> mean(state.x77[,"Population"])
[1] 4246.42
To explain 'atomic' vector more, see the R FAQ again (and this gets a bit complex, so hold on to your hat)...
R has six basic (‘atomic’) vector types: logical, integer, real,
complex, string (or character) and raw.
http://cran.r-project.org/doc/manuals/r-release/R-lang.html#Vector-objects
So atomic in this instance is referring to vectors as the basic building blocks of R objects (like atoms make up everything in the real world).
If you read R's inline help by entering ?"$" as a command, you will find it says:
‘$’ is only valid for recursive objects, and is only
discussed in the section below on recursive objects.
Since vectors (like 1:10) are basic building blocks ("atomic"), with no recursive sub-elements, trying to use $ to access parts of them will not work.
Since your matrix (statex.77) is essentially just a vector with some dimensions, like:
> str(matrix(1:10,nrow=2))
int [1:2, 1:5] 1 2 3 4 5 6 7 8 9 10
...you also can't use $ to access sub-parts.
> state.x77$Population
Error in state.x77$Population : $ operator is invalid for atomic vectors
But you can access subparts using [ and names like so:
> state.x77[,"Population"]
Alabama Alaska Arizona...
3615 365 2212...

Related

How can we do operations inside indexing operations in R?

For example, let's imagine following vector in R:
a <- 1:8; k <- 2
What I would like to do is getting for example all elements between 2k and 3k, namely:
interesting_elements <- a[2k:3k]
Erreur : unexpected symbol in "test[2k"
interesting_elements <- a[(2k):(3k)]
Erreur : unexpected symbol in "test[2k"
Unfortunately, indexing vectors in such a way in R does not work, and the only way I can do such an operation seems to create a specific variable k' storing result of 2k, and another k'' storing result of 3k.
Is there another way, without creating each time a new variable, for doing operations when indexing?
R does not interpret 2k as scalar multiplication as with other languages. You need to use explicit arithmetic operators.
If you are trying to access the 4 to 6 elements of a then you need to use * and parentheses:
a[(2*k):(3*k)]
[1] 4 5 6
If you leave off the parentheses then the sequence will evaluate first then the multiplication:
2*k:3*k
[1] 8 12
Is the same as
(k:3)*2*k
[1] 8 12

How to test if an object is a vector in R

I want to test if an object is a vector in R. I'm confused as to why
is.vector(c(0.1))
returns TRUE and so does
is.vector(0.1)
I would like it to return false when it is just a number and true when it is a vector. Can anyone offer any help on this please?
Many thanks in advance.
in R there doesn't exist a single number or string alone. They are vectors of length 1. Or embedded in some more complex structures.
is.vector(c(0.1)) and is.vector(0.1) are in R absolutely identical.
That is also the reason, why length("this is a string/character") returns 1 - because length() in this case measures the number of elements in the vector.
And you see it if you type "this is a string/character" into R console:
It returns [1] "this is a string/character" - the [1] indicates: vector of length 1.
So you have to do nchar("this is a string/character") to get the length of the first element - the charater string - returning 26.
nchar(c("this is a string/character", "and this another string"))
## [1] 26 23
## nchar is vectorized as you see ...
This is an important difference to Python, where strings and numbers can stand alone.
So len("this") returns 4 in Python. len(["this"]) however 1 (1 element in list, thus length of list is 1).
As already mentioned by #RHertel, R considers c(0.1) a vector of length 1. You may want to test for length as well. E.g.
> x <- 1
> y <- 1:2
> is.vector(x) & length(x) > 1
[1] FALSE
> is.vector(y) & length(y) > 1
[1] TRUE

How could a matrix hold different data types when coercing from a list?

As commonly known, a matrix can only hold one data type.
But it seems like I can coerce a list into a matrix just fine.
tmp <- matrix(list(1, "a"))
class(tmp)
[1] "matrix"
str(tmp)
List of 2
$ : num 1
$ : chr "a"
attr(*, "dim")= int [1:2] 2 1
Is this behavior documented?
(This is my first time answering so formatting tips and suggestions are welcome)
This is because its not a matrix of vectors, its a matrix of lists! To be more clear, the one 'type' matrix is holding is the data structure type called lists. And List are not one dimensional in R but n-dimensional! In R, this means that lists ability to handle different data types with ease allows other functions like matrix to break there own limits and handle multi-data types. In fact to paraphrase my professor, it is the precisely this n-dimensional power of R's version of lists is one of the top 10 reasons R handles Big Data better than java or python languages
As #nrussel pointed out thought the matrix() documenation only hints at this behavior and there is no exact documentation of matrix-list combinations; however, I will show a simple code will help clarify this behavior then show how supported by various websites. My sample code inspired from another question answered by 42-[1] in stack overflow:
tmp <- matrix(list(1, "a"))
str(tmp)
> List of 2
>
> $ : num 1
>
> $ : chr "a"
>
> attr(*, "dim")= int [1:2] 2 1
class(tmp)
"matrix"
is.matrix(tmp)
is.list(tmp)
is.array(tmp)
TRUE TRUE TRUE
tmp <- matrix(list(c(1,2), c("a","b"))
str(tmp)
List of 2
$ : num [1:2] 1 2
$ : chr [1:2] "a" "b"
attr(*, "dim")= int [1:2] 2 1
So what is going on here? Although beginner R tutorials[2] sometimes oversimplify list[3], lists in R are not like vectors and are not exactly 1 dimensional (1D). Unlike other languages, list are really 1 x (n x m) dimensional (1nD). (Aside: list are even sometimes be 1 x (n x m ... n+1), but I will explain this later). Like most languages, a list is collection of data structures.
So again what is going on here in the above example? Look at the output above for first str(tmp), is.matrix, and is.list. class() and is.matrix, tell us the overall function is a matrix. str() tells use however, that inside the maxtrix is a list. Str() tells use that each list its only a list of 1 value meaning its only a 1 x 1 list. So its 1 x 2 matrix of 1 x 1 lists. This is why is.list() gives the value TRUE because technically there are only lists in the matrix.
Now let's talk more about list as we look at the second example in the code or second str(tmp). Like most languages, a list is collection of data structures. The parts of a list can be simply vectors, like a data.frame, but with the columns are allowed to have a different lengths and data types. However unlike other languages, a list in R can also be a much more complicated structure (paraphrased from ramnathv[3]). Looking above, see that [1:2]? The [number:number] line tells use that each internal list in our second example is 1 x 2. However, str(tmp) is still telling us that our matrix is still only 1 x 2. This is because the matrix only lists as individuals, and matrix is behaving like 1 x 2 vector. Combining the list and matrix observations, overall the matrix lists has dimension of 1 x (2 x (1 x 2)). Its two 1 x 2's. This 1 x (2 x (1 x 2) format is shows what I meant by lists being 1 x (n x m) dimensional and what ramnathv meant by lists 'complicated structure'. Since behavior of combining lists with arrays might still not be clear lets delve a bit deeper than one normally might into this "complicated structure" of lists.
==== Deeper meaning of Lists being nD allow matrix of lists to have ====
Paul Murrell[6] when talking about lists complicated structure noted how:
In R lists act as containers. Unlike (regular) atomic vectors, the contents of a list are not restricted to a single mode and can encompass any mixture of data types. Lists are sometimes called recursive vectors, because a list can contain other lists. This makes them fundamentally different from (regular) atomic vectors.
By atomic vector, Paul is talking about what most other people call vectors. (In R regular numerical or character vectors are not called vectors but atomic vectors in order to distinguish them from generalized vectors like lists.) To illustrate what Paul Murrell means in the above quote lets look at another more complicated list. And then look at recursive vector lists when combined with matrices.
tmp1 <- matrix(list(c(1,2,3),c("a","b","c","d"),as.factor("soup")))
str(tmp1)
List of 2
$ : num [1:3] 1 2 3
$ : chr [1:4] "a" "b" "c" "d"
$ : Factor w/ 1 level "soup": 1
attr(*, "dim")= int [1:2] 2 1
Recursive Matrix Lists
tmp2 <- matrix(list(c(1,2,3),c("a","b","c"),list(c(1,2,3),c("a","b","c"))))
str(tmp2)
tmp3 <- matrix(list(c(1,2,3),c("a","b","c"),matrix(list(c(1,2,3),c("a","b","c")))))
str(tmp3)
I will leave the output for you to discover.
In the first example, we see I hope just different Lists in R can be from atomic vectors. In this list we see not only that lists can have different 1 x n sizes, and not only completely different types, but list can hold data structures (ex. factor). Factor believe it or not is not a data type but the most generalized form of vector that R has[4]; a vector that cares not for its data is homogenious in type[5]). I am going to have to wave my hands with the math of what str(tmp1) tell us, but overall matrix list is now (1 x 3*) or 1 x ((1 x 3) + (1 x 4) + (1 x 1)). Yet the matrix itself thinks its only a In this list of list example, list is allow the matrix to act like a true generalized table of structured data that is not the same by variable type, number of rows, or variable class. It is miles simpler then trying to create the same generalized structure in java or python. I hope you can see how it bring the conversation full circle to my point at the beginning....
The TLDR Answer: List in R are n dimensional not 1 dimensional! In R, lists ability to handle different data types with ease allows other functions like matrix to break there own limitations and handle multi-data types.
However, list can also contain a list of themselves as shown by the last two pieces of code. Literally I have posted two different ways a list could be used to pass in the same variables into itself as a new section of the list. If I were to draw what the data table would look like, this recurision of list is like Infinity Mirror because it has a constantly folding dimension (n + 1) or is 1 x (n x m ... n+1). I only bring this up to extend my answer to show how list-matrix has been used in R to allow a regular 2 x 2 matrix to represent an infinity matrix with only finite numbers (at least according to my professor that what he uses it for)
I hope this helped you understand lists better. Sites like http://adv-r.had.co.nz/Data-structures.html have a typo when they call a list simply just a vector. I hope this cleared up the confusion you had about lists being 1D data structures.
P.S. I could also use help with the formatting of stackoverflow. Its a bit overwhelming.
[1]: stackoverflow.com/questions/30007890/how-to-create-a-matrix-of-lists-in-r
[2]: en.wikibooks.org/wiki/R_Programming/Data_types#Lists
[3]: adv-r.had.co.nz/Data-structures.html
[4]: www.r-bloggers.com/data-types-part-3-factors/
[5]: www.tutorialspoint.com/r/r_data_types.htm
[6]: www.stat.auckland.ac.nz/~paul/ItDT/HTML/node64.html#SECTION001345000000000000000
[7]: https://ramnathv.github.io/pycon2014-r/learn/structures.html

as(x, 'double') and as.double(x) are inconsistent

x <- 1:10
str(x)
# int [1:10] 1 2 3 4 5 6 7 8 9 10
str(as.double(x))
# num [1:10] 1 2 3 4 5 6 7 8 9 10
str(as(x, 'double'))
# int [1:10] 1 2 3 4 5 6 7 8 9 10
I'd be surprised if there was a bug in R with something so basic as type conversion. Is there a reason for this inconsistency?
as is for coercing to a new class, and double technically isn't a class but rather a storage.mode.
y <- x
storage.mode(y) <- "double"
identical(x,y)
[1] FALSE
> identical(as.double(x),y)
[1] TRUE
The argument "double" is handled as a special case by as and will attempt to coerce to the class numeric, which the class integer already inherits, therefore there is no change.
is.numeric(x)
[1] TRUE
Not so fast...
While the above made sense, there is some further confusion. From ?double:
It is a historical anomaly that R has two names for its floating-point
vectors, double and numeric (and formerly had real).
double is the name of the type. numeric is the name of the mode and
also of the implicit class. As an S4 formal class, use "numeric".
The potential confusion is that R has used mode "numeric" to mean
‘double or integer’, which conflicts with the S4 usage. Thus
is.numeric tests the mode, not the class, but as.numeric (which is
identical to as.double) coerces to the class.
Therefore as should really change x according to the documentation... I will investigate further.
The plot is thicker than whipped cream and cornflour soup...
Well, if you debug as, you find out that what eventually happens is that the following method gets created rather than using the c("ANY","numeric") signature for the coerce generic which would call as.numeric:
function (from, strict = TRUE)
if (strict) {
class(from) <- "numeric"
from
} else from
So actually, class<- gets called on x and this eventually means R_set_class is called from coerce.c. I believe the following part of the function determines the behaviour:
...
else if(!strcmp("numeric", valueString)) {
setAttrib(obj, R_ClassSymbol, R_NilValue);
if(IS_S4_OBJECT(obj)) /* NULL class is only valid for S3 objects */
do_unsetS4(obj, value);
switch(TYPEOF(obj)) {
case INTSXP: case REALSXP: break;
default: PROTECT(obj = coerceVector(obj, REALSXP));
nProtect++;
}
...
Note the switch statement: it breaks out without doing coercion in the case of integers and real values.
Bug or not?
Whether or not this is a bug depends on your point of view. Integers are numeric in one sense as confirmed by is.numeric(x) returning TRUE, but strictly speaking they are not a numeric class. On the other hand, since integers get promoted to double automatically on overflow, one may view them conceptually as the same. There are two major differences: i) Integers require less storage space - this may be significant for larger vectors, and, ii) when interacting with external code that has greater type discipline conversion costs may come into play.
as(x,"double"):
Methods are pre-defined for coercing any object to one of the basic datatypes. For example, as(x, "numeric") uses the existing as.numeric function. These built-in methods can be listed by showMethods("coerce").
These functions manage the relations that allow coercing an object to
a given class.
as.double(x):
as.double is a generic function. It is identical to as.numeric. Methods should return an object of base type "double". as.double creates, coerces to or test for a double-precision vector.

Using nchar function on factor variables

can somebody explain to me what's going on here ? when a variable is coded as a factor and nchar coerces to a character, why can't that function effectively count the number of characters ?
> x <- c("73210", "73458", "73215", "72350")
> nchar(x)
[1] 5 5 5 5
>
> x <- factor(x)
> nchar(x)
[1] 1 1 1 1
>
> nchar(as.character(x))
[1] 5 5 5 5
thanks.
It is because with a factor, your data is represented by 1, 2, etc. What you mean to do is count the characters of the levels:
> nchar(levels(x)[x])
[1] 5 5 5 5
see the warning section of ?factor:
The interpretation of a factor depends on both the codes and the
‘"levels"’ attribute. Be careful only to compare factors with the
same set of levels (in the same order). In particular,
‘as.numeric’ applied to a factor is meaningless, and may happen by
implicit coercion. To transform a factor ‘f’ to approximately its
original numeric values, ‘as.numeric(levels(f))[f]’ is recommended
and slightly more efficient than ‘as.numeric(as.character(f))’.
nchar(levels(x))
The other answers are correct, I think, that the issue is that nchar is examining the underlying integer codes, not the labels. However, what I think most directly addresses your question is this piece from ?nchar:
The internal equivalent of the default method of as.character is
performed on x (so there is no method dispatch)
I'm not 100% sure, but I suspect this means that the coercion that takes place in nchar is not the same thing that happens when you directly call as.character, most likely going directly to the integer codes, rather than "smartly" looking at the labels.

Resources