Replacing items in one list from another w/matching names - r

I feel like i've forgotten something very obvious here...
Let's say we have two lists, a and b, with differing lengths:
a <- list(me = "you1", they = "our1", our = "till1", grow = "NOPE1")
b <- list(me = "my2", their = "his2", our = "aft2", new = "noise2",
they = "now2", b_names = "thurs2")
We want to replace the items in a with corresponding items from b, if an item in b has the same name as an item in a.
Manually, essentially this would equate to replacing: me, our, they in list a from those items in list b.
For the life of me the only approach i'm coming up with is using Reduce rather than match or %chin% etc, to find the intersection of names and then always using the last list object as the look-up table. I suppose you really don't need to Reduce since intersect would work find on it's own.. but regardless...
Isn't there a simpler, more straight forward way that I am simply forgetting?
Here's my code.. it works..but that's not the point.
reduce.names <- function(...){
vars <- list(...)
if(length(vars) > 2){
return("only 2 lists allowed...")
}else {
Reduce(intersect, Map(names,vars))
}
}
> matched_names <- reduce.names(a,b)
> matched_names
[1] "me" "they" "our"
a[matched_names] <- b[matched_names]
> a
$me
[1] "my2"
$they
[1] "now2"
$our
[1] "aft2"
$grow
[1] "NOPE1"
here's another approach that works... but just seems redundant and sketchy...
> merge(a,b) %>% .[names(a)]
$me
[1] "my2"
$they
[1] "now2"
$our
[1] "aft2"
$grow
[1] "NOPE1"
Any advice/alternate approach/reminder of some base function I have completely forgotten would be greatly appreciated. Thanks.

Related

Alternative to assign function in r

I am using the following code in a loop, I am just replicating the part which I am facing the problem in. The entire code is extremely long and I have removed parts which are running fine in between these lines. This is just to explain the problem:
for (j in 1:2)
{
assign(paste("numeric_data",j,sep="_"),unique_id)
for (i in 1:2)
{
assign(paste("numeric_data",j,sep="_"),
merge(eval(as.symbol(paste("numeric_data",j,sep="_"))),
eval(as.symbol(paste("sd_1",i,sep="_"))),all.x = TRUE))
}
}
The problem that I am facing is that instead of assign in the second step, I want to use (eval+paste)
for (j in 1:2)
{
assign(paste("numeric_data",j,sep="_"),unique_id)
for (i in 1:2)
{
eval(as.symbol((paste("numeric_data",j,sep="_"))))<-
merge(eval(as.symbol(paste("numeric_data",j,sep="_"))),
eval(as.symbol(paste("sd_1",i,sep="_"))),all.x = TRUE)
}
}
However R does not accept eval while assigning new variables. I looked at the forum and everywhere assign is suggested to solve the problem. However, if I use assign the loop overwrites my previously generated "numeric_data" instead of adding to it, hence I get output for only one value of i instead of both.
Here is a very basic intro to one of the most fundamental data structures in R. I highly recommend reading more about them in standard documentation sources.
#A list is a (possible named) set of objects
numeric_data <- list(A1 = 1, A2 = 2)
#I can refer to elements by name or by position, e.g. numeric_data[[1]]
> numeric_data[["A1"]]
[1] 1
#I can add elements to a list with a particular name
> numeric_data <- list()
> numeric_data[["A1"]] <- 1
> numeric_data[["A2"]] <- 2
> numeric_data
$A1
[1] 1
$A2
[1] 2
#I can refer to named elements by building the name with paste()
> numeric_data[[paste0("A",1)]]
[1] 1
#I can change all the names at once...
> numeric_data <- setNames(numeric_data,paste0("B",1:2))
> numeric_data
$B1
[1] 1
$B2
[1] 2
#...in multiple ways
> names(numeric_data) <- paste0("C",1:2)
> numeric_data
$C1
[1] 1
$C2
[1] 2
Basically, the lesson is that if you have objects with names with numeric suffixes: object_1, object_2, etc. they should almost always be elements in a single list with names that you can easily construct and refer to.

R - Return an object name from a for loop

Using a basic function such as this:
myname<-function(z){
nm <-deparse(substitute(z))
print(nm)
}
I'd like the name of the item to be printed (or returned) when iterating through a list e.g.
for (csv in list(acsv, bcsv, ccsv)){
myname(csv)
}
should print:
acsv
bcsv
ccsv
(and not csv).
It should be noted that acsv, bcsv, and ccsvs are all dataframes read in from csvs i.e.
acsv = read.csv("a.csv")
bcsv = read.csv("b.csv")
ccsv = read.csv("c.csv")
Edit:
I ended up using a bit of a compromise. The primary goal of this was not to simply print the frame name - that was the question, because it is a prerequisite for doing other things.
I needed to run the same functions on four identically formatted files. I then used this syntax:
for(i in 1:length(csvs)){
cat(names(csvs[i]), "\n")
print(nrow(csvs[[i]]))
print(nrow(csvs[[i]][1]))
}
Then the indexing of nested lists was utilized e.g.
print(nrow(csvs[[i]]))
which shows the row count for each of the dataframes.
print(nrow(csvs[[i]][1]))
Then provides a table for the first column of each dataframe.
I include this because it was the motivator for the question. I needed to be able to label the data for each dataframe being examined.
The list you have constructed doesn't "remember" the expressions it was constructed of anymore. But you can use a custom constructor:
named.list <- function(...) {
l <- list(...)
exprs <- lapply(substitute(list(...))[-1], deparse)
names(l) <- exprs
l
}
And so:
> named.list(1+2,sin(5),sqrt(3))
$`1 + 2`
[1] 3
$`sin(5)`
[1] -0.9589243
$`sqrt(3)`
[1] 1.732051
Use this list as parameter to names, as Thomas suggested:
> names(mylist(1+2,sin(5),sqrt(3)))
[1] "1 + 2" "sin(5)" "sqrt(3)"
To understand what's happening here, let's analyze the following:
> as.list(substitute(list(1+2,sqrt(5))))
[[1]]
list
[[2]]
1 + 2
[[3]]
sqrt(5)
The [-1] indexing leaves out the first element, and all remaining elements are passed to deparse, which works because of...
> lapply(as.list(substitute(list(1+2,sqrt(5))))[-1], class)
[[1]]
[1] "call"
[[2]]
[1] "call"
Note that you cannot "refactor" the call list(...) inside substitute() to use simply l. Do you see why?
I am also wondering if such a function is already available in one of the countless R packages around. I have found this post by William Dunlap effectively suggesting the same approach.
I don't know what your data look like, so here's something made up:
csvs <- list(acsv=data.frame(x=1), bcsv=data.frame(x=2), ccsv=data.frame(x=3))
for(i in 1:length(csvs))
cat(names(csvs[i]), "\n")

Losing data by using the function unlist

I have a simple but strange problem.
indices.list is a list, containing 118,771 Elements(integers or numeric). By applying the function unlist I lose about 500 elements.
Look at the following code:
> indices <- unlist(indices.list, use.names = FALSE)
>
> length(indices.list)
[1] 118771
> length(indices)
[1] 118248
How is that Possible?? I checked if indices.list contains any NA. But it does not:
> any(is.na(indices.list) == TRUE)
[1] FALSE
data.set.merged is a dataframe containing more than 200,000 rows. When I use the vector indices (which apparently has the length 118,248) in order to get a subset of data.set.merged, I get a dataframe with 118,771 rows!?? That's so strange!
data.set.merged.2 <- data.set.merged[indices, ]
> nrow(data.set.2)
[1] 118771
Any ideas whats going on here?
Well, for your first mystery, the likely explanation is that some elements of indices.list are NULL, which means they will disappear when you use unlist:
unlist(list(a = 1,b = "test",c = 2,d = NULL, e = 5))
a b c e
"1" "test" "2" "5"

Assigning NULL to a list element in R?

I found this behaviour odd and wanted more experienced users to share their thoughts and workarounds.
On running the code sample below in R:
sampleList <- list()
d<- data.frame(x1 = letters[1:10], x2 = 1:10, stringsAsFactors = FALSE)
for(i in 1:nrow(d)) {
sampleList[[i]] <- d$x1[i]
}
print(sampleList[[1]])
#[1] "a"
print(sampleList[[2]])
#[1] "b"
print(sampleList[[3]])
#[1] "c"
print(length(sampleList))
#[1] 10
sampleList[[2]] <- NULL
print(length(sampleList))
#[1] 9
print(sampleList[[2]])
#[1] "c"
print(sampleList[[3]])
#[1] "d"
The list elements get shifted up.
Maybe this is as expected, but I am trying to implement a function where I merge two elements of a list and drop one. I basically want to lose that list index or have it as NULL.
Is there any way I can assign NULL to it and not see the above behaviour?
Thank you for your suggestions.
Good question.
Check out the R-FAQ:
In R, if x is a list, then x[i] <- NULL and x[[i]] <- NULL remove the specified elements from x. The first of these is incompatible with S, where it is a no-op. (Note that you can set elements to NULL using x[i] <- list(NULL).)
consider the following example:
> t <- list(1,2,3,4)
> t[[3]] <- NULL # removing 3'd element (with following shifting)
> t[2] <- list(NULL) # setting 2'd element to NULL.
> t
[[1]]
[2] 1
[[2]]
NULL
[[3]]
[3] 4
UPDATE:
As the author of the R Inferno commented, there can be more subtle situations when dealing with NULL. Consider pretty general structure of code:
# x is some list(), now we want to process it.
> for (i in 1:n) x[[i]] <- some_function(...)
Now be aware, that if some_function() returns NULL, you maybe will not get what you want: some elements will just disappear. you should rather use lapply function.
Take a look at this toy example:
> initial <- list(1,2,3,4)
> processed_by_for <- list(0,0,0,0)
> processed_by_lapply <- list(0,0,0,0)
> toy_function <- function(x) {if (x%%2==0) return(x) else return(NULL)}
> for (i in 1:4) processed_by_for[[i]] <- toy_function(initial[[i]])
> processed_by_lapply <- lapply(initial, toy_function)
> processed_by_for
[[1]]
[1] 0
[[2]]
[1] 2
[[3]]
NULL
[[4]]
[1] 4
> processed_by_lapply
[[1]]
NULL
[[2]]
[1] 2
[[3]]
NULL
[[4]]
[1] 4
Your question is a bit confusing to me.
Assigning null to an existing object esentially deletes that object (this can be very handy for instance if you have a data frame and wish to delete specific columns). That's what you've done. I am unable to determine what it is that you want though. You could try
sampleList[[2]] <- NA
instead of NULL, but if by "I want to lose" you mean delete it, then you've already succeeded. That's why, "The list elements get shifted up."
obj = list(x = "Some Value")
obj = c(obj,list(y=NULL)) #ADDING NEW VALUE
obj['x'] = list(NULL) #SETTING EXISTING VALUE
obj
If you need to create a list of NULL values which later you can populate with values (dataframes, for example) here is no complain:
B <-vector("list", 2)
a <- iris[sample(nrow(iris), 10), ]
b <- iris[sample(nrow(iris), 10), ]
B[[1]]<-a
B[[2]]<-b
The above answers are similar, but I thought this was worth posting.
Took me a while to figure this one out for a list of lists. My solution was:
mylist[[i]][j] <- list(double())

R-thonic replacement for simple for loops containing a condition

I'm using R, and I'm a beginner. I have two large lists (30K elements each). One is called descriptions and where each element is (maybe) a tokenized string. The other is called probes where each element is a number. I need to make a dictionary that mapsprobes to something in descriptions, if that something is there. Here's how I'm going about this:
probe2gene <- list()
for (i in 1:length(probes)){
strings<-strsplit(descriptions[i]), '//')
if (length(strings[[1]]) > 1){
probe2gene[probes[i]] = strings[[1]][2]
}
}
Which works fine, but seems slow, much slower than the roughly equivalent python:
probe2gene = {}
for p,d in zip(probes, descriptions):
try:
probe2gene[p] = descriptions.split('//')[1]
except IndexError:
pass
My question: is there an "R-thonic" way of doing what I'm trying to do? The R manual entry on for loops suggests that such loops are rare. Is there a better solution?
Edit: a typical good "description" looks like this:
"NM_009826 // Rb1cc1 // RB1-inducible coiled-coil 1 // 1 A2 // 12421 /// AB070619 // Rb1cc1 // RB1-inducible coiled-coil 1 // 1 A2 // 12421 /// ENSMUST00000027040 // Rb1cc1 // RB1-inducible coiled-coil 1 // 1 A2 // 12421"
a bad "description: looks like this
"-----"
though it can quite easily be some other not-very-helpful string. Each probe is simply a number. The probe and description vectors are the same length, and completely correspond to each other, i.e. probe[i] maps to description[i].
It's usually better in R if you use the various apply-like functions, rather than a loop. I think this solves your problem; the only drawback is that you have to use string keys.
> descriptions <- c("foo//bar", "")
> probes <- c(10, 20)
> probe2gene <- lapply(strsplit(descriptions, "//"), function (x) x[2])
> names(probe2gene) <- probes
> probe2gene <- probe2gene[!is.na(probe2gene)]
> probe2gene[["10"]]
[1] "bar"
Unfortunately, R doesn't have a good dictionary/map type. The closest I've found is using lists as a map from string-to-value. That seems to be idiomatic, but it's ugly.
If I understand correctly you are looking to save each probe-description combination where the there is more than one (split) value in description?
Probe and Description are the same length?
This is kind of messy but a quick first pass at it?
a <- list("a","b","c")
b <- list(c("a","b"),c("DEF","ABC"),c("Z"))
names(b) <- a
matches <- which(lapply(b, length)>1) #several ways to do this
b <- lapply(b[matches], function(x) x[2]) #keeps the second element only
That's my first attempt. If you have a sample dataset that would be very useful.
Best regards,
Jay
Another way.
probe<-c(4,3,1)
gene<-c('red//hair','strange','blue//blood')
probe2gene<-character()
probe2gene[probe]<-sapply(strsplit(gene,'//'),'[',2)
probe2gene
[1] "blood" NA NA "hair"
In the sapply, we take advantage of the fact that in R the subsetting operator is also a function named '[' to which we can pass the index as an argument. Also, an out-of-range index does not cause an error but gives a NA value. On the left hand of the same line, we use the fact that we can pass a vector of indices in any order and with gaps.
Here's another approach that should be fast. Note that this doesn't
remove the empty descriptions. It could be adapted to do that or you
could clean those in a post processing step using lapply. Is it the
case that you'll never have a valid description of length one?
make_desc <- function(n)
{
word <- function(x) paste(sample(letters, 5, replace=TRUE), collapse = "")
if (runif(1) < 0.70)
paste(sapply(seq_len(n), word), collapse = "//")
else
"----"
}
description <- sapply(seq_len(10), make_desc)
probes <- seq_len(length(description))
desc_parts <- strsplit(description, "//", fixed=TRUE, useBytes=TRUE)
lens <- sapply(desc_parts, length)
probes_expand <- rep(probes, lens)
ans <- split(unlist(desc_parts), probes_expand)
> description
[1] "fmbec"
[2] "----"
[3] "----"
[4] "frrii//yjxsa//wvkce//xbpkc"
[5] "kazzp//ifrlz//ztnkh//dtwow//aqvcm"
[6] "stupm//ncqhx//zaakn//kjymf//swvsr//zsexu"
[7] "wajit//sajgr//cttzf//uagwy//qtuyh//iyiue//xelrq"
[8] "nirex//awvnw//bvexw//mmzdp//lvetr//xvahy//qhgym//ggdax"
[9] "----"
[10] "ubabx//tvqrd//vcxsp//rjshu//gbmvj//fbkea//smrgm//qfmpy//tpudu//qpjbu"
> ans[[3]]
[1] "----"
> ans[[4]]
[1] "frrii" "yjxsa" "wvkce" "xbpkc"

Resources