I am trying to understand names, lists and lists of lists in R. It would be convenient to have a way to dynamically label them like this:
> ll <- list("1" = 2)
> ll
$`1`
[1] 2
But this is not working:
> ll <- list(as.character(1) = 2)
Error: unexpected '=' in "ll <- list(as.character(1) ="
Neither is this:
> ll <- list(paste(1) = 2)
Error: unexpected '=' in "ll <- list(paste(1) ="
Why is that? Both paste() and as.character() are returning "1".
The reason is that paste(1) is a function call that evaluates to a string, not a string itself.
The The R Language Definition says this:
Each argument can be tagged (tag=expr), or just be a simple expression.
It can also be empty or it can be one of the special tokens ‘...’, ‘..2’, etc.
A tag can be an identifier or a text string.
Thus, tags can't be expressions.
However, if you want to set names (which are just an attribute), you can do so with structure, eg
> structure(1:5, names=LETTERS[1:5])
A B C D E
1 2 3 4 5
Here, LETTERS[1:5] is most definitely an expression.
If your goal is simply to use integers as names (as in the question title), you can type them in with backticks or single- or double-quotes (as the OP already knows). They are converted to characters, since all names are characters in R.
I can't offer a deep technical explanation for why your later code fails beyond "the left-hand side of = is not evaluated in that context (of enumerating items in a list)". Here's one workaround:
mylist <- list()
mylist[[paste("a")]] <- 2
mylist[[paste("b")]] <- 3
mylist[[paste("c")]] <- matrix(1:4,ncol=2)
mylist[[paste("d")]] <- mean
And here's another:
library(data.table)
tmp <- rbindlist(list(
list(paste("a"), list(2)),
list(paste("b"), list(3)),
list(paste("c"), list(matrix(1:4,ncol=2))),
list(paste("d"), list(mean))
))
res <- setNames(tmp$V2,tmp$V1)
identical(mylist,res) # TRUE
The drawbacks of each approach are pretty serious, I think. On the other hand, I've never found myself in need of richer naming syntax.
Related
I have this vector
v <- c("firstOne","firstTwo","secondOne")
I would like to factor the vector assigning c("firstOne","firstTwo) to the same level (i.e., firstOne). I have tried this:
> factor(v, labels = c("firstOne", "firstOne", "secondOne"))
[1] firstOne firstOne secondOne
Levels: firstOne firstOne secondOne
But I get a duplicate factor (and a warning message advising not to use it). Instead, I would like the output to look like:
[1] firstOne firstOne secondOne
Levels: firstOne secondOne
Is there any way to get this output without brutally substituting the character strings?
Here are a couple of options:
v <- factor(ifelse(v %in% c("firstOne", "firstTwo"), "firstOne", "secondOne"))
v <- factor(v,levels = c("firstOne","secondOne")); f[is.na(f)] <- 'firstOne'
A factor is just a numeric (integer) vector with labels, and so manipulating a factor is equivalent to manipulating integers, rather than character strings. Therefore performance-wise is perfectly OK to do
f <- as.factor(v)
f[f %in% c('firstOne', 'firstTwo')] <- 'firstOne'
f <- droplevels(f)
You could use the rec-function of the sjmisc-package:
rec(v, "firstTwo=firstOne;else=copy", as.fac = T)
> [1] firstOne firstOne secondOne
> Levels: firstOne secondOne
(the output is shortened; note that the sjmisc-package supports labelled data and thus adds label attributes to the vector, which you'll see in the console output as well)
Eventually I also found a solution which looks somehow sloppy but I don't see major issues (looking forward to listen which might be possible problems with this tho):
v <- c("firstOne","firstTwo","secondOne")
factor(v)
factor(factor(v,labels = c("firstOne","firstOne","secondOne")))
Using a basic function such as this:
myname<-function(z){
nm <-deparse(substitute(z))
print(nm)
}
I'd like the name of the item to be printed (or returned) when iterating through a list e.g.
for (csv in list(acsv, bcsv, ccsv)){
myname(csv)
}
should print:
acsv
bcsv
ccsv
(and not csv).
It should be noted that acsv, bcsv, and ccsvs are all dataframes read in from csvs i.e.
acsv = read.csv("a.csv")
bcsv = read.csv("b.csv")
ccsv = read.csv("c.csv")
Edit:
I ended up using a bit of a compromise. The primary goal of this was not to simply print the frame name - that was the question, because it is a prerequisite for doing other things.
I needed to run the same functions on four identically formatted files. I then used this syntax:
for(i in 1:length(csvs)){
cat(names(csvs[i]), "\n")
print(nrow(csvs[[i]]))
print(nrow(csvs[[i]][1]))
}
Then the indexing of nested lists was utilized e.g.
print(nrow(csvs[[i]]))
which shows the row count for each of the dataframes.
print(nrow(csvs[[i]][1]))
Then provides a table for the first column of each dataframe.
I include this because it was the motivator for the question. I needed to be able to label the data for each dataframe being examined.
The list you have constructed doesn't "remember" the expressions it was constructed of anymore. But you can use a custom constructor:
named.list <- function(...) {
l <- list(...)
exprs <- lapply(substitute(list(...))[-1], deparse)
names(l) <- exprs
l
}
And so:
> named.list(1+2,sin(5),sqrt(3))
$`1 + 2`
[1] 3
$`sin(5)`
[1] -0.9589243
$`sqrt(3)`
[1] 1.732051
Use this list as parameter to names, as Thomas suggested:
> names(mylist(1+2,sin(5),sqrt(3)))
[1] "1 + 2" "sin(5)" "sqrt(3)"
To understand what's happening here, let's analyze the following:
> as.list(substitute(list(1+2,sqrt(5))))
[[1]]
list
[[2]]
1 + 2
[[3]]
sqrt(5)
The [-1] indexing leaves out the first element, and all remaining elements are passed to deparse, which works because of...
> lapply(as.list(substitute(list(1+2,sqrt(5))))[-1], class)
[[1]]
[1] "call"
[[2]]
[1] "call"
Note that you cannot "refactor" the call list(...) inside substitute() to use simply l. Do you see why?
I am also wondering if such a function is already available in one of the countless R packages around. I have found this post by William Dunlap effectively suggesting the same approach.
I don't know what your data look like, so here's something made up:
csvs <- list(acsv=data.frame(x=1), bcsv=data.frame(x=2), ccsv=data.frame(x=3))
for(i in 1:length(csvs))
cat(names(csvs[i]), "\n")
I want to use information from a field and include it in a R function, e.g.:
data #name of the data.frame with only one raw
"(if(nclusters>0){OptmizationInputs[3,3]*beta[1]}else{0})" # this is the raw
If I want to use this information inside a function how could I do it?
Another example:
A=c('x^2')
B=function (x) A
B(2)
"x^2" # this is the return. I would like to have the return something like 2^2=4.
Use body<- and parse
A <- 'x^2'
B <- function(x) {}
body(B) <- parse(text = A)
B(3)
## [1] 9
There are more ideas here
Another option using plyr:
A <- 'x^2'
library(plyr)
body(B) <- as.quoted(A)[[1]]
> B(5)
[1] 25
A <- "x^2"; x <- 2
BB <- function(z){ print( as.expression(do.call("substitute",
list( parse(text=A)[[1]], list(x=eval(x) ) )))[[1]] );
cat( "is equal to ", eval(parse(text=A)))
}
BB(2)
#2^2
#is equal to 4
Managing expressions in R is very weird. substitute refuses to evaluate its first argument so you need to use do.call to allow the evaluation to occur before the substitution. Furthermore the printed representation of the expressions hides their underlying representation. Try removing the fairly cryptic (to my way of thinking) [[1]] after the as.expression(.) result.
Frequently I encounter situations where I need to create a lot of similar models for different variables. Usually I dump them into the list. Here is the example of dummy code:
modlist <- lapply(1:10,function(l) {
data <- data.frame(Y=rnorm(10),X=rnorm(10))
lm(Y~.,data=data)
})
Now getting the fit for example is very easy:
lapply(modlist,predict)
What I want to do sometimes is to extract one element from the list. The obvious way is
sapply(modlist,function(l)l$rank)
This does what I want, but I wonder if there is a shorter way to get the same result?
probably these are a little bit simple:
> z <- list(list(a=1, b=2), list(a=3, b=4))
> sapply(z, `[[`, "b")
[1] 2 4
> sapply(z, get, x="b")
[1] 2 4
and you can define a function like:
> `%c%` <- function(x, n)sapply(x, `[[`, n)
> z %c% "b"
[1] 2 4
and also this looks like an extension of $:
> `%$%` <- function(x, n) sapply(x, `[[`, as.character(as.list(match.call())$n))
> z%$%b
[1] 2 4
I usually use kohske way, but here is another trick:
sapply(modlist, with, rank)
It is more useful when you need more elements, e.g.:
sapply(modlist, with, c(rank, df.residual))
As I remember I stole it from hadley (from plyr documentation I think).
Main difference between [[ and with solutions is in case missing elements. [[ returns NULL when element is missing. with throw an error unless there exist an object in global workspace having same name as searched element. So e.g.:
dah <- 1
lapply(modlist, with, dah)
returns list of ones when modlist don't have any dah element.
With Hadley's new lowliner package you can supply map() with a numeric index or an element name to elegantly pluck components out of a list. map() is the equivalent of lapply() with some extra tricks.
library("lowliner")
l <- list(
list(a = 1, b = 2),
list(a = 3, b = 4)
)
map(l, "b")
map(l, 2)
There is also a version that simplifies the result to a vector
map_v(l, "a")
map_v(l, 1)
I stumbled upon this weird behavior in R:
> a = 5
> names(a) <- "bar"
> b = c(foo = a)
> names(b)
[1] "foo.bar"
Why do the names get concatenated/stacked?
I found this c(a=b) syntax in a script, but I couldn't find documentation about it. Is there any documentation for that?
Why do the names get
concatenated/stacked?
Because it preserves all the name information that was present before the concatenation. If you don't like it, use unname.
I found this c(a=b) syntax in a
script, but I couldn't find
documentation about it. Is there any
documentation for that?
Some of the examples on the ?c page demonstrate c(name = value) behaviour, but there isn't much more to it than that. You might also want to look at ?names.
It might also be instructive to see what happens if a is a vector; in this case if foo=a just redefined the name, all elements of the vector would end up with the same name. Instead, as in the following example, the four elements end up with unique names, which can be nice.
> a <- c(A=1, B=2)
> b <- c(A=3, B=4)
> c(a=a, b=b)
a.A a.B b.A b.B
1 2 3 4