I am using the following code in a loop, I am just replicating the part which I am facing the problem in. The entire code is extremely long and I have removed parts which are running fine in between these lines. This is just to explain the problem:
for (j in 1:2)
{
assign(paste("numeric_data",j,sep="_"),unique_id)
for (i in 1:2)
{
assign(paste("numeric_data",j,sep="_"),
merge(eval(as.symbol(paste("numeric_data",j,sep="_"))),
eval(as.symbol(paste("sd_1",i,sep="_"))),all.x = TRUE))
}
}
The problem that I am facing is that instead of assign in the second step, I want to use (eval+paste)
for (j in 1:2)
{
assign(paste("numeric_data",j,sep="_"),unique_id)
for (i in 1:2)
{
eval(as.symbol((paste("numeric_data",j,sep="_"))))<-
merge(eval(as.symbol(paste("numeric_data",j,sep="_"))),
eval(as.symbol(paste("sd_1",i,sep="_"))),all.x = TRUE)
}
}
However R does not accept eval while assigning new variables. I looked at the forum and everywhere assign is suggested to solve the problem. However, if I use assign the loop overwrites my previously generated "numeric_data" instead of adding to it, hence I get output for only one value of i instead of both.
Here is a very basic intro to one of the most fundamental data structures in R. I highly recommend reading more about them in standard documentation sources.
#A list is a (possible named) set of objects
numeric_data <- list(A1 = 1, A2 = 2)
#I can refer to elements by name or by position, e.g. numeric_data[[1]]
> numeric_data[["A1"]]
[1] 1
#I can add elements to a list with a particular name
> numeric_data <- list()
> numeric_data[["A1"]] <- 1
> numeric_data[["A2"]] <- 2
> numeric_data
$A1
[1] 1
$A2
[1] 2
#I can refer to named elements by building the name with paste()
> numeric_data[[paste0("A",1)]]
[1] 1
#I can change all the names at once...
> numeric_data <- setNames(numeric_data,paste0("B",1:2))
> numeric_data
$B1
[1] 1
$B2
[1] 2
#...in multiple ways
> names(numeric_data) <- paste0("C",1:2)
> numeric_data
$C1
[1] 1
$C2
[1] 2
Basically, the lesson is that if you have objects with names with numeric suffixes: object_1, object_2, etc. they should almost always be elements in a single list with names that you can easily construct and refer to.
Related
I have a dataframe of nested list indices, which have been stored as strings. To give a simplified example:
df1 <- data.frame(x = c("lst$x$y$a", "lst$x$y$b"), stringsAsFactors = F)
These are then coordinates for the following list:
lst <- list(x=list(y=list(a="foo",b="bar",c="")))
I'd like replace values or assign new values to these elements using the indices in df1.
One attempt was
do.call(`<-`, list(eval(parse(text = df1[1,1])), "somethingelse"))
but this doesn't seem tow work. Instead it assigns "something" to foo.
I'm not too happy with using eval(parse(text=)) (maintaining code will become a nightmare), but recognise I may have little choice.
Any tips welcome.
Let's consider 3 situations:
Case 1
do.call(`<-`, list("lst$x$y$a", "somethingelse"))
This will create a new variable named lst$x$y$a in your workspace, so the following two commands will call different objects. (The former is the object you store in lst, and the latter is the new variable. You need to call it with backticks because its name will confuse R.)
> lst$x$y$a # [1] "foo"
> `lst$x$y$a` # [1] "somethingelse"
Case 2
do.call(`<-`, list(parse(text = "lst$x$y$a"), "somethingelse"))
You mostly get what you expect with this one but an error still occurs:
invalid (do_set) left-hand side to assignment
Let's check:
> parse(text = "lst$x$y$a") # expression(lst$x$y$a)
It belongs to the class expression, and the operator <- seems not to accept this class to the left-hand side.
Case 3
This one will achieve what you want:
do.call(`<-`, list(parse(text = "lst$x$y$a")[[1]], "somethingelse"))
If put [[1]] behind an expression object, a call object will be extracted and take effect in the operator <-.
> lst
# $x
# $x$y
# $x$y$a
# [1] "somethingelse"
#
# $x$y$b
# [1] "bar"
#
# $x$y$c
# [1] ""
I am trying to understand how to succinctly implement something like the argument capture/parsing/evaluation mechanism that enables the following behavior with dplyr::tibble() (FKA dplyr::data_frame()):
# `b` finds `a` in previous arg
dplyr::tibble(a=1:5, b=a+1)
## a b
## 1 2
## 2 3
## ...
# `b` can't find `a` bc it doesn't exist yet
dplyr::tibble(b=a+1, a=1:5)
## Error in eval_tidy(xs[[i]], unique_output) : object 'a' not found
With base:: classes like data.frame and list, this isn't possible (maybe bc arguments aren't interpreted sequentially(?) and/or maybe bc they get evaluated in the parent environment(?)):
data.frame(a=1:5, b=a+1)
## Error in data.frame(a = 1:5, b = a + 1) : object 'a' not found
list(a=1:5, b=a+1)
## Error: object 'a' not found
So my question is: what might be a good strategy in base R to write a function list2() that is just like base::list() except that it allows tibble() behavior like list2(a=1:5, b=a+1)??
I'm aware that this is part of what "tidyeval" does, but I am interested in isolating the exact mechanism that makes this trick possible. And I'm aware that one could just say list(a <- 1:5, b <- a+1), but I am looking for a solution that does not use global assignment.
What I've been thinking so far: One inelegant and unsafe way to achieve the desired behavior would be the following -- first parse the arguments into strings, then create an environment, add each element to that environment, put them into a list, and return (suggestions for better ways to parse ... into a named list appreciated!):
list2 <- function(...){
# (gross bc we are converting code to strings and then back again)
argstring <- as.character(match.call(expand.dots=FALSE))[2]
argstring <- gsub("^pairlist\\((.+)\\)$", "\\1", argstring)
# (terrible bc commas aren't allowed except to separate args!!!)
argstrings <- strsplit(argstring, split=", ?")[[1]]
env <- new.env()
# (icky bc all args must have names)
for (arg in argstrings){
eval(parse(text=arg), envir=env)
}
vars <- ls(env)
out <- list()
for (var in vars){
out <- c(out, list(eval(parse(text=var), envir=env)))
}
return(setNames(out, vars))
}
This allows us to derive the basic behavior, but it doesn't generalize well at all (see comments in list2() definition):
list2(a=1:5, b=a+1)
## $a
## [1] 1 2 3 4 5
##
## $b
## [1] 2 3 4 5 6
We could introduce hacks to fix little things like producing names when they aren't supplied, e.g. like this:
# (still gross but at least we don't have to supply names for everything)
list3 <- function(...){
argstring <- as.character(match.call(expand.dots=FALSE))[2]
argstring <- gsub("^pairlist\\((.+)\\)$", "\\1", argstring)
argstrings <- strsplit(argstring, split=", ?")[[1]]
env <- new.env()
# if a name isn't supplied, create one of the form `v1`, `v2`, ...
ctr <- 0
for (arg in argstrings){
ctr <- ctr+1
if (grepl("^[a-zA-Z_] ?= ?", arg))
eval(parse(text=arg), envir=env)
else
eval(parse(text=paste0("v", ctr, "=", arg)), envir=env)
}
vars <- ls(env)
out <- list()
for (var in vars){
out <- c(out, list(eval(parse(text=var), envir=env)))
}
return(setNames(out, vars))
}
Then instead of this:
# evaluates `a+b-2`, but doesn't include in `env`
list2(a=1:5, b=a+1, a+b-2)
## $a
## [1] 1 2 3 4 5
##
## $b
## [1] 2 3 4 5 6
We get this:
list3(a=1:5, b=a+1, a+b-2)
## $a
## [1] 1 2 3 4 5
##
## $b
## [1] 2 3 4 5 6
##
## $v3
## [1] 1 3 5 7 9
But it feels like there will still be problematic edge cases even if we fix the issue with commas, with names, etc.
Anyone have any ideas/suggestions/insights/solutions/etc.??
Many thanks!
The reason data.frame(a=1:5, b=a+1) doesn't work is a scoping issue, not an evaluation order issue.
Arguments to a function are normally evaluated in the calling frame. When you say a+1, you are referring to the variable a in the frame that made the call to data.frame, not the column that you are about to create.
dplyr::data_frame does very non-standard evaluation, so it can mix up frames as you saw. It appears to look first in the frame corresponding to the object that is under construction, and second in the usual place.
One way to use the dplyr semantics with a base function is to do both,
e.g.
do.call(data.frame, as.list(dplyr::data_frame(a = 1:5, b = a+1)))
but this is kind of useless: you can convert a tibble to a dataframe directly, and this can't be used with other base functions, since it forces all arguments to the same length.
To write your list2 function, I'd recommend looking at the source of dplyr::data_frame, and do everything it does except the final conversion to a tibble. It's source is deceptively short:
function (...)
{
xs <- quos(..., .named = TRUE)
as_tibble(lst_quos(xs, expand = TRUE))
}
This is deceptive, because lst_quos is a private function in the tibble package, so you'll need your own copy of that, plus any private functions it calls, etc. Unless of course you don't mind using private functions, then here's your list2:
list2 <- function(...) {
xs <- rlang::quos(..., .named = TRUE)
tibble:::lst_quos(xs, expand = TRUE)
}
This will work until the tibble maintainer chooses to change lst_quos, which he's free to do without warning (since it's private). It wouldn't be acceptable code in a CRAN package because of this fragility.
Using a basic function such as this:
myname<-function(z){
nm <-deparse(substitute(z))
print(nm)
}
I'd like the name of the item to be printed (or returned) when iterating through a list e.g.
for (csv in list(acsv, bcsv, ccsv)){
myname(csv)
}
should print:
acsv
bcsv
ccsv
(and not csv).
It should be noted that acsv, bcsv, and ccsvs are all dataframes read in from csvs i.e.
acsv = read.csv("a.csv")
bcsv = read.csv("b.csv")
ccsv = read.csv("c.csv")
Edit:
I ended up using a bit of a compromise. The primary goal of this was not to simply print the frame name - that was the question, because it is a prerequisite for doing other things.
I needed to run the same functions on four identically formatted files. I then used this syntax:
for(i in 1:length(csvs)){
cat(names(csvs[i]), "\n")
print(nrow(csvs[[i]]))
print(nrow(csvs[[i]][1]))
}
Then the indexing of nested lists was utilized e.g.
print(nrow(csvs[[i]]))
which shows the row count for each of the dataframes.
print(nrow(csvs[[i]][1]))
Then provides a table for the first column of each dataframe.
I include this because it was the motivator for the question. I needed to be able to label the data for each dataframe being examined.
The list you have constructed doesn't "remember" the expressions it was constructed of anymore. But you can use a custom constructor:
named.list <- function(...) {
l <- list(...)
exprs <- lapply(substitute(list(...))[-1], deparse)
names(l) <- exprs
l
}
And so:
> named.list(1+2,sin(5),sqrt(3))
$`1 + 2`
[1] 3
$`sin(5)`
[1] -0.9589243
$`sqrt(3)`
[1] 1.732051
Use this list as parameter to names, as Thomas suggested:
> names(mylist(1+2,sin(5),sqrt(3)))
[1] "1 + 2" "sin(5)" "sqrt(3)"
To understand what's happening here, let's analyze the following:
> as.list(substitute(list(1+2,sqrt(5))))
[[1]]
list
[[2]]
1 + 2
[[3]]
sqrt(5)
The [-1] indexing leaves out the first element, and all remaining elements are passed to deparse, which works because of...
> lapply(as.list(substitute(list(1+2,sqrt(5))))[-1], class)
[[1]]
[1] "call"
[[2]]
[1] "call"
Note that you cannot "refactor" the call list(...) inside substitute() to use simply l. Do you see why?
I am also wondering if such a function is already available in one of the countless R packages around. I have found this post by William Dunlap effectively suggesting the same approach.
I don't know what your data look like, so here's something made up:
csvs <- list(acsv=data.frame(x=1), bcsv=data.frame(x=2), ccsv=data.frame(x=3))
for(i in 1:length(csvs))
cat(names(csvs[i]), "\n")
Frequently I encounter situations where I need to create a lot of similar models for different variables. Usually I dump them into the list. Here is the example of dummy code:
modlist <- lapply(1:10,function(l) {
data <- data.frame(Y=rnorm(10),X=rnorm(10))
lm(Y~.,data=data)
})
Now getting the fit for example is very easy:
lapply(modlist,predict)
What I want to do sometimes is to extract one element from the list. The obvious way is
sapply(modlist,function(l)l$rank)
This does what I want, but I wonder if there is a shorter way to get the same result?
probably these are a little bit simple:
> z <- list(list(a=1, b=2), list(a=3, b=4))
> sapply(z, `[[`, "b")
[1] 2 4
> sapply(z, get, x="b")
[1] 2 4
and you can define a function like:
> `%c%` <- function(x, n)sapply(x, `[[`, n)
> z %c% "b"
[1] 2 4
and also this looks like an extension of $:
> `%$%` <- function(x, n) sapply(x, `[[`, as.character(as.list(match.call())$n))
> z%$%b
[1] 2 4
I usually use kohske way, but here is another trick:
sapply(modlist, with, rank)
It is more useful when you need more elements, e.g.:
sapply(modlist, with, c(rank, df.residual))
As I remember I stole it from hadley (from plyr documentation I think).
Main difference between [[ and with solutions is in case missing elements. [[ returns NULL when element is missing. with throw an error unless there exist an object in global workspace having same name as searched element. So e.g.:
dah <- 1
lapply(modlist, with, dah)
returns list of ones when modlist don't have any dah element.
With Hadley's new lowliner package you can supply map() with a numeric index or an element name to elegantly pluck components out of a list. map() is the equivalent of lapply() with some extra tricks.
library("lowliner")
l <- list(
list(a = 1, b = 2),
list(a = 3, b = 4)
)
map(l, "b")
map(l, 2)
There is also a version that simplifies the result to a vector
map_v(l, "a")
map_v(l, 1)
I'm using R, and I'm a beginner. I have two large lists (30K elements each). One is called descriptions and where each element is (maybe) a tokenized string. The other is called probes where each element is a number. I need to make a dictionary that mapsprobes to something in descriptions, if that something is there. Here's how I'm going about this:
probe2gene <- list()
for (i in 1:length(probes)){
strings<-strsplit(descriptions[i]), '//')
if (length(strings[[1]]) > 1){
probe2gene[probes[i]] = strings[[1]][2]
}
}
Which works fine, but seems slow, much slower than the roughly equivalent python:
probe2gene = {}
for p,d in zip(probes, descriptions):
try:
probe2gene[p] = descriptions.split('//')[1]
except IndexError:
pass
My question: is there an "R-thonic" way of doing what I'm trying to do? The R manual entry on for loops suggests that such loops are rare. Is there a better solution?
Edit: a typical good "description" looks like this:
"NM_009826 // Rb1cc1 // RB1-inducible coiled-coil 1 // 1 A2 // 12421 /// AB070619 // Rb1cc1 // RB1-inducible coiled-coil 1 // 1 A2 // 12421 /// ENSMUST00000027040 // Rb1cc1 // RB1-inducible coiled-coil 1 // 1 A2 // 12421"
a bad "description: looks like this
"-----"
though it can quite easily be some other not-very-helpful string. Each probe is simply a number. The probe and description vectors are the same length, and completely correspond to each other, i.e. probe[i] maps to description[i].
It's usually better in R if you use the various apply-like functions, rather than a loop. I think this solves your problem; the only drawback is that you have to use string keys.
> descriptions <- c("foo//bar", "")
> probes <- c(10, 20)
> probe2gene <- lapply(strsplit(descriptions, "//"), function (x) x[2])
> names(probe2gene) <- probes
> probe2gene <- probe2gene[!is.na(probe2gene)]
> probe2gene[["10"]]
[1] "bar"
Unfortunately, R doesn't have a good dictionary/map type. The closest I've found is using lists as a map from string-to-value. That seems to be idiomatic, but it's ugly.
If I understand correctly you are looking to save each probe-description combination where the there is more than one (split) value in description?
Probe and Description are the same length?
This is kind of messy but a quick first pass at it?
a <- list("a","b","c")
b <- list(c("a","b"),c("DEF","ABC"),c("Z"))
names(b) <- a
matches <- which(lapply(b, length)>1) #several ways to do this
b <- lapply(b[matches], function(x) x[2]) #keeps the second element only
That's my first attempt. If you have a sample dataset that would be very useful.
Best regards,
Jay
Another way.
probe<-c(4,3,1)
gene<-c('red//hair','strange','blue//blood')
probe2gene<-character()
probe2gene[probe]<-sapply(strsplit(gene,'//'),'[',2)
probe2gene
[1] "blood" NA NA "hair"
In the sapply, we take advantage of the fact that in R the subsetting operator is also a function named '[' to which we can pass the index as an argument. Also, an out-of-range index does not cause an error but gives a NA value. On the left hand of the same line, we use the fact that we can pass a vector of indices in any order and with gaps.
Here's another approach that should be fast. Note that this doesn't
remove the empty descriptions. It could be adapted to do that or you
could clean those in a post processing step using lapply. Is it the
case that you'll never have a valid description of length one?
make_desc <- function(n)
{
word <- function(x) paste(sample(letters, 5, replace=TRUE), collapse = "")
if (runif(1) < 0.70)
paste(sapply(seq_len(n), word), collapse = "//")
else
"----"
}
description <- sapply(seq_len(10), make_desc)
probes <- seq_len(length(description))
desc_parts <- strsplit(description, "//", fixed=TRUE, useBytes=TRUE)
lens <- sapply(desc_parts, length)
probes_expand <- rep(probes, lens)
ans <- split(unlist(desc_parts), probes_expand)
> description
[1] "fmbec"
[2] "----"
[3] "----"
[4] "frrii//yjxsa//wvkce//xbpkc"
[5] "kazzp//ifrlz//ztnkh//dtwow//aqvcm"
[6] "stupm//ncqhx//zaakn//kjymf//swvsr//zsexu"
[7] "wajit//sajgr//cttzf//uagwy//qtuyh//iyiue//xelrq"
[8] "nirex//awvnw//bvexw//mmzdp//lvetr//xvahy//qhgym//ggdax"
[9] "----"
[10] "ubabx//tvqrd//vcxsp//rjshu//gbmvj//fbkea//smrgm//qfmpy//tpudu//qpjbu"
> ans[[3]]
[1] "----"
> ans[[4]]
[1] "frrii" "yjxsa" "wvkce" "xbpkc"