Why did a show up in my table name? - r

I ran this in R:
a <- factor(c("A","A","B","A","B","B","C","A","C"))
And then I made a table
results <- table(a)
but when I run
> attributes(results)
$dim
[1] 3
$dimnames
$dimnames$a
But I'm confused why does a show up in my attributes? I've programmed in Java before and I thought variable names weren't supposed to show up in your functions .

R functions can not only see the data you pass to them, but they can see the actual call that was run to invoke them. So when you run, table(a) the table() function not only sees the values of a, but is also can see that those values came from a variable named a.
So by default table() likes to name each dimension in the resulting table. If you don't pass explicit names in the call via the dnn= parameter, table() will look back to the call, and turn the variable name into a character and use that value for the dimension name.
So after table() has ran, it has no direct connection to the variable a, it merely used the name of that variable as a character label of the results.
Many functions in R do this. For example this is similar to how plot(height~weight, data=data.frame(height=runif(10), weight=runif(10))) knows to use the names "weight" and "height" for the axis labels on the plot.
Here's a simple example to show one way this can be accomplished.
paramnames <- function(...) {
as.character(substitute(...()))
}
paramnames(a,b,x)
# [1] "a" "b" "x"

I think the only answer is because the designers wanted it that way. It seems reasonable to label table objects with the names of variables that formed the margins:
> b <- c(1,1,1,2,2,2, 3,3,3)
> table(a, b)
b
a 1 2 3
A 2 1 1
B 1 2 0
C 0 0 2
R was intended as a clone of S, and S was intended as a tool for working statisticians. R also has a handy function for working with table objects, as.data.frame:
> as.data.frame(results)
a Freq
1 A 4
2 B 3
3 C 2
If you want to build a function that performs the same sort of labeling or that otherwise retrieves the name of the object passed to your function then there is the deparse(substitute(.))-maneuver:
myfunc <- function(x) { nam <- deparse(substitute(x)); print(nam)}
> myfunc <- function(x) { nam <- deparse(substitute(x)); print(nam)}
> myfunc(z)
[1] "z"
> str(z)
Error in str(z) : object 'z' not found
So "z" doesn't even need to exist. Highly "irregular" if you ask me. If you "ask" myfunc what its argument list looks like you get the expected answer:
> formals(myfunc)
$x
But that is a list with an R-name for its single element x. R names are language elements, whereas the names function will retrieve it as a character value, "x", which is not a language element:
> names(formals(myfunc))
[1] "x"
R has some of the aspects of Lisp (interpreted, functional (usually)) although the dividing line between its language functions and the data objects seems less porous to me, but I'm not particularly proficient in Lisp.

Related

Change name of column vector in R [duplicate]

so we know that R have list() variable, and also know that R has function call names() to give names for variable. For example :
a=30
names(a)="number"
a
# number
# 30
But now, I want to give a list variable a name, like this :
b=list()
names(b)="number"
and it returns error message like this :
Error in names(b) = "number" :
'names' attribute [1] must be the same length as the vector [0]
What I have suppose to do? I do this because I need many list variables. Or, do you have another way so I can make many list variables without playing with its name?
Since #akrun doesn't need any more points, here is an example showing how you can assign names to a list:
lst <- list(a="one", b="two", c=c(1:3))
names(lst)
[1] "a" "b" "c"
names(lst) <- c("x", "y", "z")
> lst
$x
[1] "one"
$y
[1] "two"
$z
[1] 1 2 3
It seems as though you are interested in labeling the object itself rather than the elements in it. The key is that the names attribute for a list object is necessarily assigned to its elements. One option, since you said you have many list objects, is to store the lists in a big list and then you can assign names to the big list, and the elements within the list-objects can be named too.
allLists <- list('number' = list())
> allLists
$number
list()
Another option, you can make use of the label feature in the Hmisc package. It modifies most common objects in R to have a subclass "labelled" and so whenever you print the list it shows the label. It is good for documentation and organizing the workspace a bit better, but caveat it's very easy to accidentally cast labelled objects to a non-labelled class and to confuse methods that don't think to search for more than one class.
library(Hmisc)
p <- list()
label(p) <- 'number'
> p
number
list()
Another option is to make the "name" of your list object an actual element of the list. You'll see in a lot of complex R data structures, this is the preferred way of storing labels, titles, or names when such a need arises and isn't met by the base R data structure.
b <- list('name' = 'number')
The last possibility is that you need a placeholder to store the "names" attribute of the elements you haven't yet populated the list with. If the elements are of known length and of known type, you can allocate such a vector using e.g. numeric(1) a sort-of "primed" vector which can be named. If you don't know the data structure of your output, I would not use this approach since it can be a real memory hog to "build" data structures in R.
Other possibilities are
as.list(a)
# $`number`
# [1] 30
# or
setNames(list(unname(a)),'number')
# $`number`
# [1] 30
# or named list with named vector
setNames(list(a), 'number')
# $`number`
# number
# 30

assign names of elements in a list in a for loop [duplicate]

so we know that R have list() variable, and also know that R has function call names() to give names for variable. For example :
a=30
names(a)="number"
a
# number
# 30
But now, I want to give a list variable a name, like this :
b=list()
names(b)="number"
and it returns error message like this :
Error in names(b) = "number" :
'names' attribute [1] must be the same length as the vector [0]
What I have suppose to do? I do this because I need many list variables. Or, do you have another way so I can make many list variables without playing with its name?
Since #akrun doesn't need any more points, here is an example showing how you can assign names to a list:
lst <- list(a="one", b="two", c=c(1:3))
names(lst)
[1] "a" "b" "c"
names(lst) <- c("x", "y", "z")
> lst
$x
[1] "one"
$y
[1] "two"
$z
[1] 1 2 3
It seems as though you are interested in labeling the object itself rather than the elements in it. The key is that the names attribute for a list object is necessarily assigned to its elements. One option, since you said you have many list objects, is to store the lists in a big list and then you can assign names to the big list, and the elements within the list-objects can be named too.
allLists <- list('number' = list())
> allLists
$number
list()
Another option, you can make use of the label feature in the Hmisc package. It modifies most common objects in R to have a subclass "labelled" and so whenever you print the list it shows the label. It is good for documentation and organizing the workspace a bit better, but caveat it's very easy to accidentally cast labelled objects to a non-labelled class and to confuse methods that don't think to search for more than one class.
library(Hmisc)
p <- list()
label(p) <- 'number'
> p
number
list()
Another option is to make the "name" of your list object an actual element of the list. You'll see in a lot of complex R data structures, this is the preferred way of storing labels, titles, or names when such a need arises and isn't met by the base R data structure.
b <- list('name' = 'number')
The last possibility is that you need a placeholder to store the "names" attribute of the elements you haven't yet populated the list with. If the elements are of known length and of known type, you can allocate such a vector using e.g. numeric(1) a sort-of "primed" vector which can be named. If you don't know the data structure of your output, I would not use this approach since it can be a real memory hog to "build" data structures in R.
Other possibilities are
as.list(a)
# $`number`
# [1] 30
# or
setNames(list(unname(a)),'number')
# $`number`
# [1] 30
# or named list with named vector
setNames(list(a), 'number')
# $`number`
# number
# 30

In R, how to use values as the variable names

I know the function get can help you transform values to variable names, like get(test[1]). But I find it is not compatible when the values is in a list format. Look at below example:
> A = c("obj1$length","obj1$width","obj1$height","obj1$weight")
> obj1 <- NULL
> obj1$length=c(1:4);obj1$width=c(5:8);obj1$height=c(9:12);obj1$weight=c(13:16)
> get(A[1])
Error in get(A[1]) : object 'obj1$length' not found
In this case, how can I retrieve the variable name?
get doesn't work like that you need to specify the variable and environment (the list is coerced to one) separately:
get("length",obj1)
[1] 1 2 3 4
Do do it with the data you have, you need to use eval and parse:
eval(parse(text=A[1]))
[1] 1 2 3 4
However, I suggest you rethink your approach to the problem as get, eval and parse are blunt tools that can bite you later.
I think that eval() function will do the trick, among other uses.
eval(A[1])
>[1] 1 2 3 4
You could also find useful this simple function I implemented (based in the commonly used combo eval, parse, paste):
evaluate<-function(..., envir=.GlobalEnv){ eval(parse(text=paste( ... ,sep="")), envir=envir) }
It concatenates and evaluates several character type objects. If you want it to be used inside another function, add at the begining of your function
envir <- environment()
and use it like this:
evaluate([some character objects], envir=envir)
Try, for example
myvariable1<-"aaa"; myvariable2<-"bbb"; aaa<-15; bbb<-3
evaluate(myvariable1," * ",myvariable2).
I find it very usefull when I have to evaluate similar sentences with several variables, or when I want to create variables with automatically generated names.
for(i in 1:100){evaluate("variable",i,"<-2*",i)}

Hiding function names from ls() results - to find a variable name more quickly

When we have defined tens of functions - probably to develop a new package - it is hard to find out the name of a specific variable among many function names through ls() command.
In most of cases we are not looking for a function name - we already know they exist - but we want to find what was the name we assigned to a variable.
Any idea to solve it is highly appreciated.
If you want a function to do this, you need to play around a bit with the environment that ls() looks in. In normal usage, the implementation below will work by listing objects in the parent frame of the function, which will be the global environment if called at the top level.
lsnofun <- function(name = parent.frame()) {
obj <- ls(name = name)
obj[!sapply(obj, function(x) is.function(get(x)))]
}
> ls()
[1] "bar" "crossvalidate" "df"
[4] "f1" "f2" "foo"
[7] "lsnofun" "prod"
> lsnofun()
[1] "crossvalidate" "df" "f1"
[4] "f2" "foo" "prod"
I've written this so you can pass in the name argument of ls() if you need to call this way down in a series of nested function calls.
Note also we need to get() the objects named by ls() when we test if they are a function or not.
Rather than sorting through the objects in your global environment and trying to seperate data objects from functions it would be better to store the functions in a different environment so that ls() does not list them (by default it only lists things in the global environment). But they are still accessible and can be listed if desired.
The best way to do this is to make a package with the functions in it. This is not as hard as it sometimes seems, just use package.skeleton to start.
Another alternative is to use the save function to save all your functions to a file, delete them from the global environment, then use the attach function to attach this file (and therefore all the functions) to the search path.
So perhaps
ls()[!ls()%in%lsf.str()]
Josh O'Brien's suggestion was to use
setdiff(ls(), lsf.str())
That function, after some conversions and checks, calls
x[match(x, y, 0L) == 0L]
which is pretty close to what I suggested in the first place, but is packed nicely in the function setdiff.
The following function lsos was previously posted on stackoverflow (link) - it gives a nice ordering of objects loaded in your R session based on their size. The output of the function contains the class of the object, which you can subsequently filter to get the non-function objects.
source("lsos.R")
A <- 1
B <- 1
C <- 1
D <- 1
E <- 1
F <- function(x) print(x)
L <- lsos(n=Inf)
L[L$Type != "function",]
This returns:
> lsos(n=Inf)
Type Size Rows Columns
lsos function 5184 NA NA
F function 1280 NA NA
A numeric 48 1 NA
B numeric 48 1 NA
C numeric 48 1 NA
D numeric 48 1 NA
E numeric 48 1 NA
Or, with the filter, the function F is not returned:
> L[L$Type != "function",]
Type Size Rows Columns
A numeric 48 1 NA
B numeric 48 1 NA
C numeric 48 1 NA
D numeric 48 1 NA
E numeric 48 1 NA
So you just want the variable names, not the functions? This will do that.
ls()[!sapply(ls(), function(x) is.function(get(x)))]
I keep this function in my .rprofile. I don't use it often but it's great when I have several environments, function and objects in my global environment. Clearly not as elegant as BenBarnes' solution but I never have to remember the syntax and can just call lsa() as needed. This also allows me to list specific environments. e.g. lsa(e)
lsa <- function(envir = .GlobalEnv) {
obj_type <- function(x) {
class(get(x))
}
lis <- data.frame(sapply(ls(envir = envir), obj_type))
lis$object_name <- rownames(lis)
names(lis)[1] <- "class"
names(lis)[2] <- "object"
return(unrowname(lis))
}

How to correctly use lists?

Brief background: Many (most?) contemporary programming languages in widespread use have at least a handful of ADTs [abstract data types] in common, in particular,
string (a sequence comprised of characters)
list (an ordered collection of values), and
map-based type (an unordered array that maps keys to values)
In the R programming language, the first two are implemented as character and vector, respectively.
When I began learning R, two things were obvious almost from the start: list is the most important data type in R (because it is the parent class for the R data.frame), and second, I just couldn't understand how they worked, at least not well enough to use them correctly in my code.
For one thing, it seemed to me that R's list data type was a straightforward implementation of the map ADT (dictionary in Python, NSMutableDictionary in Objective C, hash in Perl and Ruby, object literal in Javascript, and so forth).
For instance, you create them just like you would a Python dictionary, by passing key-value pairs to a constructor (which in Python is dict not list):
x = list("ev1"=10, "ev2"=15, "rv"="Group 1")
And you access the items of an R List just like you would those of a Python dictionary, e.g., x['ev1']. Likewise, you can retrieve just the 'keys' or just the 'values' by:
names(x) # fetch just the 'keys' of an R list
# [1] "ev1" "ev2" "rv"
unlist(x) # fetch just the 'values' of an R list
# ev1 ev2 rv
# "10" "15" "Group 1"
x = list("a"=6, "b"=9, "c"=3)
sum(unlist(x))
# [1] 18
but R lists are also unlike other map-type ADTs (from among the languages I've learned anyway). My guess is that this is a consequence of the initial spec for S, i.e., an intention to design a data/statistics DSL [domain-specific language] from the ground-up.
three significant differences between R lists and mapping types in other languages in widespread use (e.g,. Python, Perl, JavaScript):
first, lists in R are an ordered collection, just like vectors, even though the values are keyed (ie, the keys can be any hashable value not just sequential integers). Nearly always, the mapping data type in other languages is unordered.
second, lists can be returned from functions even though you never passed in a list when you called the function, and even though the function that returned the list doesn't contain an (explicit) list constructor (Of course, you can deal with this in practice by wrapping the returned result in a call to unlist):
x = strsplit(LETTERS[1:10], "") # passing in an object of type 'character'
class(x) # returns 'list', not a vector of length 2
# [1] list
A third peculiar feature of R's lists: it doesn't seem that they can be members of another ADT, and if you try to do that then the primary container is coerced to a list. E.g.,
x = c(0.5, 0.8, 0.23, list(0.5, 0.2, 0.9), recursive=TRUE)
class(x)
# [1] list
my intention here is not to criticize the language or how it is documented; likewise, I'm not suggesting there is anything wrong with the list data structure or how it behaves. All I'm after is to correct is my understanding of how they work so I can correctly use them in my code.
Here are the sorts of things I'd like to better understand:
What are the rules which determine when a function call will return a list (e.g., strsplit expression recited above)?
If I don't explicitly assign names to a list (e.g., list(10,20,30,40)) are the default names just sequential integers beginning with 1? (I assume, but I am far from certain that the answer is yes, otherwise we wouldn't be able to coerce this type of list to a vector w/ a call to unlist.)
Why do these two different operators, [], and [[]], return the same result?
x = list(1, 2, 3, 4)
both expressions return "1":
x[1]
x[[1]]
why do these two expressions not return the same result?
x = list(1, 2, 3, 4)
x2 = list(1:4)
Please don't point me to the R Documentation (?list, R-intro)--I have read it carefully and it does not help me answer the type of questions I recited just above.
(lastly, I recently learned of and began using an R Package (available on CRAN) called hash which implements conventional map-type behavior via an S4 class; I can certainly recommend this Package.)
Just to address the last part of your question, since that really points out the difference between a list and vector in R:
Why do these two expressions not return the same result?
x = list(1, 2, 3, 4); x2 = list(1:4)
A list can contain any other class as each element. So you can have a list where the first element is a character vector, the second is a data frame, etc. In this case, you have created two different lists. x has four vectors, each of length 1. x2 has 1 vector of length 4:
> length(x[[1]])
[1] 1
> length(x2[[1]])
[1] 4
So these are completely different lists.
R lists are very much like a hash map data structure in that each index value can be associated with any object. Here's a simple example of a list that contains 3 different classes (including a function):
> complicated.list <- list("a"=1:4, "b"=1:3, "c"=matrix(1:4, nrow=2), "d"=search)
> lapply(complicated.list, class)
$a
[1] "integer"
$b
[1] "integer"
$c
[1] "matrix"
$d
[1] "function"
Given that the last element is the search function, I can call it like so:
> complicated.list[["d"]]()
[1] ".GlobalEnv" ...
As a final comment on this: it should be noted that a data.frame is really a list (from the data.frame documentation):
A data frame is a list of variables of the same number of rows with unique row names, given class ‘"data.frame"’
That's why columns in a data.frame can have different data types, while columns in a matrix cannot. As an example, here I try to create a matrix with numbers and characters:
> a <- 1:4
> class(a)
[1] "integer"
> b <- c("a","b","c","d")
> d <- cbind(a, b)
> d
a b
[1,] "1" "a"
[2,] "2" "b"
[3,] "3" "c"
[4,] "4" "d"
> class(d[,1])
[1] "character"
Note how I cannot change the data type in the first column to numeric because the second column has characters:
> d[,1] <- as.numeric(d[,1])
> class(d[,1])
[1] "character"
Regarding your questions, let me address them in order and give some examples:
1) A list is returned if and when the return statement adds one. Consider
R> retList <- function() return(list(1,2,3,4)); class(retList())
[1] "list"
R> notList <- function() return(c(1,2,3,4)); class(notList())
[1] "numeric"
R>
2) Names are simply not set:
R> retList <- function() return(list(1,2,3,4)); names(retList())
NULL
R>
3) They do not return the same thing. Your example gives
R> x <- list(1,2,3,4)
R> x[1]
[[1]]
[1] 1
R> x[[1]]
[1] 1
where x[1] returns the first element of x -- which is the same as x. Every scalar is a vector of length one. On the other hand x[[1]] returns the first element of the list.
4) Lastly, the two are different between they create, respectively, a list containing four scalars and a list with a single element (that happens to be a vector of four elements).
Just to take a subset of your questions:
This article on indexing addresses the question of the difference between [] and [[]].
In short [[]] selects a single item from a list and [] returns a list of the selected items. In your example, x = list(1, 2, 3, 4)' item 1 is a single integer but x[[1]] returns a single 1 and x[1] returns a list with only one value.
> x = list(1, 2, 3, 4)
> x[1]
[[1]]
[1] 1
> x[[1]]
[1] 1
One reason lists work as they do (ordered) is to address the need for an ordered container that can contain any type at any node, which vectors do not do. Lists are re-used for a variety of purposes in R, including forming the base of a data.frame, which is a list of vectors of arbitrary type (but the same length).
Why do these two expressions not return the same result?
x = list(1, 2, 3, 4); x2 = list(1:4)
To add to #Shane's answer, if you wanted to get the same result, try:
x3 = as.list(1:4)
Which coerces the vector 1:4 into a list.
Just to add one more point to this:
R does have a data structure equivalent to the Python dict in the hash package. You can read about it in this blog post from the Open Data Group. Here's a simple example:
> library(hash)
> h <- hash( keys=c('foo','bar','baz'), values=1:3 )
> h[c('foo','bar')]
<hash> containing 2 key-value pairs.
bar : 2
foo : 1
In terms of usability, the hash class is very similar to a list. But the performance is better for large datasets.
You say:
For another, lists can be returned
from functions even though you never
passed in a List when you called the
function, and even though the function
doesn't contain a List constructor,
e.g.,
x = strsplit(LETTERS[1:10], "") # passing in an object of type 'character'
class(x)
# => 'list'
And I guess you suggest that this is a problem(?). I'm here to tell you why it's not a problem :-). Your example is a bit simple, in that when you do the string-split, you have a list with elements that are 1 element long, so you know that x[[1]] is the same as unlist(x)[1]. But what if the result of strsplit returned results of different length in each bin. Simply returning a vector (vs. a list) won't do at all.
For instance:
stuff <- c("You, me, and dupree", "You me, and dupree",
"He ran away, but not very far, and not very fast")
x <- strsplit(stuff, ",")
xx <- unlist(strsplit(stuff, ","))
In the first case (x : which returns a list), you can tell what the 2nd "part" of the 3rd string was, eg: x[[3]][2]. How could you do the same using xx now that the results have been "unraveled" (unlist-ed)?
This is a very old question, but I think that a new answer might add some value since, in my opinion, no one directly addressed some of the concerns in the OP.
Despite what the accepted answer suggests, list objects in R are not hash maps. If you want to make a parallel with python, list are more like, you guess, python lists (or tuples actually).
It's better to describe how most R objects are stored internally (the C type of an R object is SEXP). They are made basically of three parts:
an header, which declares the R type of the object, the length and some other meta data;
the data part, which is a standard C heap-allocated array (contiguous block of memory);
the attributes, which are a named linked list of pointers to other R objects (or NULL if the object doesn't have attributes).
From an internal point of view, there is little difference between a list and a numeric vector for instance. The values they store are just different. Let's break two objects into the paradigm we described before:
x <- runif(10)
y <- list(runif(10), runif(3))
For x:
The header will say that the type is numeric (REALSXP in the C-side), the length is 10 and other stuff.
The data part will be an array containing 10 double values.
The attributes are NULL, since the object doesn't have any.
For y:
The header will say that the type is list (VECSXP in the C-side), the length is 2 and other stuff.
The data part will be an array containing 2 pointers to two SEXP types, pointing to the value obtained by runif(10) and runif(3) respectively.
The attributes are NULL, as for x.
So the only difference between a numeric vector and a list is that the numeric data part is made of double values, while for the list the data part is an array of pointers to other R objects.
What happens with names? Well, names are just some of the attributes you can assign to an object. Let's see the object below:
z <- list(a=1:3, b=LETTERS)
The header will say that the type is list (VECSXP in the C-side), the length is 2 and other stuff.
The data part will be an array containing 2 pointers to two SEXP types, pointing to the value obtained by 1:3 and LETTERS respectively.
The attributes are now present and are a names component which is a character R object with value c("a","b").
From the R level, you can retrieve the attributes of an object with the attributes function.
The key-value typical of an hash map in R is just an illusion. When you say:
z[["a"]]
this is what happens:
the [[ subset function is called;
the argument of the function ("a") is of type character, so the method is instructed to search such value from the names attribute (if present) of the object z;
if the names attribute isn't there, NULL is returned;
if present, the "a" value is searched in it. If "a" is not a name of the object, NULL is returned;
if present, the position of the first occurence is determined (1 in the example). So the first element of the list is returned, i.e. the equivalent of z[[1]].
The key-value search is rather indirect and is always positional. Also, useful to keep in mind:
in hash maps the only limit a key must have is that it must be hashable. names in R must be strings (character vectors);
in hash maps you cannot have two identical keys. In R, you can assign names to an object with repeated values. For instance:
names(y) <- c("same", "same")
is perfectly valid in R. When you try y[["same"]] the first value is retrieved. You should know why at this point.
In conclusion, the ability to give arbitrary attributes to an object gives you the appearance of something different from an external point of view. But R lists are not hash maps in any way.
x = list(1, 2, 3, 4)
x2 = list(1:4)
all.equal(x,x2)
is not the same because 1:4 is the same as c(1,2,3,4).
If you want them to be the same then:
x = list(c(1,2,3,4))
x2 = list(1:4)
all.equal(x,x2)
Although this is a pretty old question I must say it is touching exactly the knowledge I was missing during my first steps in R - i.e. how to express data in my hand as an object in R or how to select from existing objects. It is not easy for an R novice to think "in an R box" from the very beginning.
So I myself started to use crutches below which helped me a lot to find out what object to use for what data, and basically to imagine real-world usage.
Though I not giving exact answers to the question the short text below might help the reader who just started with R and is asking similar questions.
Atomic vector ... I called that "sequence" for myself, no direction, just sequence of same types. [ subsets.
Vector ... the sequence with one direction from 2D, [ subsets.
Matrix ... bunch of vectors with the same length forming rows or columns, [ subsets by rows and columns, or by sequence.
Arrays ... layered matrices forming 3D
Dataframe ... a 2D table like in excel, where I can sort, add or remove rows or columns or make arit. operations with them, only after some time I truly recognized that data frame is a clever implementation of list where I can subset using [ by rows and columns, but even using [[.
List ... to help myself I thought about the list as of tree structure where [i] selects and returns whole branches and [[i]] returns item from the branch. And because it is tree like structure, you can even use an index sequence to address every single leaf on a very complex list using its [[index_vector]]. Lists can be simple or very complex and can mix together various types of objects into one.
So for lists you can end up with more ways how to select a leaf depending on situation like in the following example.
l <- list("aaa",5,list(1:3),LETTERS[1:4],matrix(1:9,3,3))
l[[c(5,4)]] # selects 4 from matrix using [[index_vector]] in list
l[[5]][4] # selects 4 from matrix using sequential index in matrix
l[[5]][1,2] # selects 4 from matrix using row and column in matrix
This way of thinking helped me a lot.
Regarding vectors and the hash/array concept from other languages:
Vectors are the atoms of R. Eg, rpois(1e4,5) (5 random numbers), numeric(55) (length-55 zero vector over doubles), and character(12) (12 empty strings), are all "basic".
Either lists or vectors can have names.
> n = numeric(10)
> n
[1] 0 0 0 0 0 0 0 0 0 0
> names(n)
NULL
> names(n) = LETTERS[1:10]
> n
A B C D E F G H I J
0 0 0 0 0 0 0 0 0 0
Vectors require everything to be the same data type. Watch this:
> i = integer(5)
> v = c(n,i)
> v
A B C D E F G H I J
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> class(v)
[1] "numeric"
> i = complex(5)
> v = c(n,i)
> class(v)
[1] "complex"
> v
A B C D E F G H I J
0+0i 0+0i 0+0i 0+0i 0+0i 0+0i 0+0i 0+0i 0+0i 0+0i 0+0i 0+0i 0+0i 0+0i 0+0i
Lists can contain varying data types, as seen in other answers and the OP's question itself.
I've seen languages (ruby, javascript) in which "arrays" may contain variable datatypes, but for example in C++ "arrays" must be all the same datatype. I believe this is a speed/efficiency thing: if you have a numeric(1e6) you know its size and the location of every element a priori; if the thing might contain "Flying Purple People Eaters" in some unknown slice, then you have to actually parse stuff to know basic facts about it.
Certain standard R operations also make more sense when the type is guaranteed. For example cumsum(1:9) makes sense whereas cumsum(list(1,2,3,4,5,'a',6,7,8,9)) does not, without the type being guaranteed to be double.
As to your second question:
Lists can be returned from functions even though you never passed in a List when you called the function
Functions return different data types than they're input all the time. plot returns a plot even though it doesn't take a plot as an input. Arg returns a numeric even though it accepted a complex. Etc.
(And as for strsplit: the source code is here.)
If it helps, I tend to conceive "lists" in R as "records" in other pre-OO languages:
they do not make any assumptions about an overarching type (or rather the type of all possible records of any arity and field names is available).
their fields can be anonymous (then you access them by strict definition order).
The name "record" would clash with the standard meaning of "records" (aka rows) in database parlance, and may be this is why their name suggested itself: as lists (of fields).
why do these two different operators, [ ], and [[ ]], return the same result?
x = list(1, 2, 3, 4)
[ ] provides sub setting operation. In general sub set of any object
will have the same type as the original object. Therefore, x[1]
provides a list. Similarly x[1:2] is a subset of original list,
therefore it is a list. Ex.
x[1:2]
[[1]] [1] 1
[[2]] [1] 2
[[ ]] is for extracting an element from the list. x[[1]] is valid
and extract the first element from the list. x[[1:2]] is not valid as [[ ]]
does not provide sub setting like [ ].
x[[2]] [1] 2
> x[[2:3]] Error in x[[2:3]] : subscript out of bounds
you can try something like,
set.seed(123)
l <- replicate(20, runif(sample(1:10,1)), simplify = FALSE)
out <- vector("list", length(l))
for (i in seq_along(l)) {
out[[i]] <- length(unique(l[[i]])) #length(l[[i]])
}
unlist(out)
unlist(lapply(l,length))
unlist(lapply(l, class))
unlist(lapply(l, mean))
unlist(lapply(l, max))

Resources