I learnt that a vector is a sequence of data elements of the same basic type. Then what will we call a in the following code (as it contains both numeric and charater):
a = c(1,"b")
is.vector(a)
[1] TRUE
So is the definition of vector wrong? I referred this tutorial.
The tutorial simplifies and that can cause confusion. Its definition describes "basic vector types", but there are also "generic vectors".
From the language definition (which you should study):
2.1.1 Vectors
Vectors can be thought of as contiguous cells containing data. Cells
are accessed through indexing operations such as x[5]. More details
are given in Indexing.
R has six basic (‘atomic’) vector types: logical, integer, real,
complex, string (or character) and raw. The modes and storage modes
for the different vector types are listed in the following table.
typeof mode storage.mode
logical logical logical
integer numeric integer
double numeric double
complex complex complex
character character character
raw raw raw
Single numbers, such as 4.2,
and strings, such as "four point two" are still vectors, of length 1;
there are no more basic types. Vectors with length zero are possible
(and useful).
2.1.2 Lists
Lists (“generic vectors”) are another kind of data storage. Lists have
elements, each of which can contain any type of R
object, i.e. the elements of a list do not have to be of the same
type. List elements are accessed through three different indexing
operations. These are explained in detail in Indexing.
Lists are vectors, and the basic vector types are referred to as
atomic vectors where it is necessary to exclude lists.
From help("is.vector"):
If mode = "any", is.vector may return TRUE for the atomic modes, list
and expression. For any mode, it will return FALSE if x has any
attributes except names. [...]
(An expression is basically a list.)
Note that factors are not vectors; is.vector returns FALSE and as.vector converts a factor to a character vector for mode = "any".
Finally, as #Henrik points out, c coerces all arguments to the same type.
Actually, in your example, the "1" will be viewed as a character by R.
a<-c(1,"b")
typeof(a[1])
[1] "character"
Related
I tried running a code to identify the type of the vector produced while combining different data types. Here is the code and what I got as the output. Can somebody explain why this output is seen?
v<-c(1L,2,TRUE)
typeof(v)
Output: [1] "double"
Seems like this is the rule:
When you attempt to combine different types they will be coerced in a fixed order: character → double → integer → logical. For example, combining a character and an integer yields a character.
An atomic vector can only hold values of a single data type. If you put several different types in it, these get coerced to a common type. In your case double.
IF you want to keep the data type of the original values, you need to use a list. Lists do not have this restriction.
As far as I know, what most languages call a string, R calls a character vector. For example, "Alice" is not a string, it's a character vector of length 1. Similarly, c("Alice", "Bob") is a character vector of length 2. I cannot recall my IDE or any of my work with R's type system telling me that R has any internal concept of "strings".
Despite this, R's documentation frequently uses the word "string":
?paste and ?nchar frequently talk of "character strings".
Many "See Also" sections mention strings without any qualifier, e.g. ?paste, ?chartr, and ?agrep.
?strsplit mentions "substrings".
?agrep, ?toString, and ?adist talk about strings both in their titles and "Description" sections.
strsplit, strwidth, and toString have string or a shorthand for it in their names.
So does R actually have a concept of strings, or does it always mean exactly the same thing as "character vector"?
Converting my comment to an answer.
A description of character and string can be found in the R Language Definition:
R has six basic (‘atomic’) vector types: logical, integer, real, complex, string (or character) and raw. The modes and storage modes for the different vector types are listed in the following table.
typeof
mode
storage.mode
logical
logical
logical
integer
numeric
integer
double
numeric
double
complex
complex
complex
character
character
character
raw
raw
raw
[...]
String vectors have mode and storage mode "character". A single element of a character vector is often referred to as a character string.
In R for Everyone by Jared P. Lander on p. 54 it says "...NULL is atomical and cannot exist within a vector. If used inside a vector, it simply disappears."
I understand the concept of being atomic is being indivisible and that NULL represents "nothingness", used commonly to handle returns that are undefined.
Therefore, is NULL atomical b/c it has this one value always of "nothingness", meaning something simply does not exist and therefore R's way of handling that is to just not let it exist in a vector or on assignment in a list it will actually remove that element?
Trying to wrap my head around it and find a more intuitive and comprehensive answer.
In my opinion talking about vectors as being "atomic" is more confusing than helpful. Instead, consider that R has a series of data types built into the language. They are given by definition and are distinct from one another.
For example, one such data type is "integer vector", which represents a sequence of integer values. Note that R does not have a data type of "integer". If we are talking about integer 5 in R, it is actually an integer vector of length 1.
Another built-in data type is NULL. There is a single object of type NULL, which is also called NULL. Since NULL is a type and an object, but not an integer value, it cannot be part of an integer vector.
Missing data in an integer vector are represented by NA. In this context NA is considered an integer value. Note that NA can also be a numeric value, logical value, etc. NA is a not a data type, but a value.
A complete list of built-in data types can be found in the R source code and also in the documentation, e.g. https://cran.r-project.org/doc/manuals/r-release/R-ints.html#SEXPTYPEs
Title essentially says it all. I'm having trouble figuring out the difference between initializing a vector with vector(mode="list") and a list with list().
There are some minor differences in the signatures, list() can take value arguments or tag = value arguments whereas vector() cannot.
And then there's the following quote from the list() documentation:
Almost all lists in R internally are Generic Vectors
So is there any actual difference beside the fact that lists can be initialized with tags and values?
I'd say they're the same:
identical(list(),vector(mode="list", length=0))
## [1] TRUE
(see also this question about the confusing fact that a list is a vector in R: usually when R users refer to "vectors", they actually mean atomic vectors ...)
In my experience the most common use case for vector(mode="list",...) is when you want to initialize a list with length>0. vector(mode="list",10) might be a little more expressive than replicate(10,NULL). If you want to create a length-0 list I can't see any reason to use vector() instead of list().
This question already has answers here:
The difference between bracket [ ] and double bracket [[ ]] for accessing the elements of a list or dataframe
(11 answers)
Closed 8 years ago.
My question stems from the usage of [[ and ]] in user created functions to reference list elements. From what I can tell, [[ and ]] work the same way as [ and ] when applied to vectors.
Is this true of all other list operations though? As another example, I can use lapply on a vector.
It makes sense that this is true if a list is just a generalised vector, whose entries can be of differing modes.
EDIT: The one-and-a-half line answer is that both lists and atomic vectors are types of vectors, and subset exactly the same way.
This answer expands on the difference between lists and atomic vectors.
The best explanation of R's data structures, specifically between lists and atomic vectors, is (in my opinion) Hadley Wickham's new book:
http://adv-r.had.co.nz/Data-structures.html
Both lists and atomic vectors are 1 dimensional data structures. However, atomic vectors are homogeneous and lists are heterogeneous. Lists can contain any type of vector, including other lists. Atomic vectors are flat on the other hand.
As far as subsetting using [] vs [[]], [] is preserving for both lists and atomic vectors, where as [[]] is simplifying. Thus, [] and [[]] are NOT the same, whether applied to lists OR atomic vectors. For example, [[]] will simplify a named vector by removing the name; subsetting a named vector by [] will keep the name. For a list, [[]] will pull out the contents of a list, and can return a number of simplified data structures. Subsetting a list by [] will always return a list (preserving).
Subsetting an atomic vector by [[]] returns a length one atomic vector. Subsetting a list by [[]] can return a number of different classes of data structures. This goes back to the fact that atomic vectors are homogeneous and lists are heterogeneous. However, according to Hadley, subsetting a list works exactly the same way as subsetting an atomic vector.
Take a look at this section of Hadley's book for further reference:
http://adv-r.had.co.nz/Subsetting.html#subsetting-operators
Since I wasn't able to come up with any more counter examples, I referred to the documentation on R's internals, and it appears your intuition is correct.
If you look at the section on the underlying structure of R's data structures in C,
SEXPTYPEs, lists are implied to be generic vectors:
19 VECSXP list (generic vector)